Atom P in residue DA 90 was not found in rtp entry DA5 with 30 atoms while sorting atoms

GROMACS version:
GROMACS modification: Yes/No
Here post your question
Hello, I am working with circular DNA. I have modified the names of the first and last residues. However, when using pdb2gmx, I encountered this error.
Processing chain 1 (2829 atoms, 89 residues)

Identified residue DG1 as a starting terminus.

Identified residue DT89 as a ending terminus.
8 out of 8 lines of specbond.dat converted successfully
Opening force field file ./amber99sb-ildn-phi-bsc0-cufix.ff/aminoacids.arn
Opening force field file ./amber99sb-ildn-phi-bsc0-cufix.ff/dna.arn
Opening force field file ./amber99sb-ildn-phi-bsc0-cufix.ff/rna.arn

Checking for duplicate atoms…

Generating any missing hydrogen atoms and/or adding termini.

Now there are 89 residues with 2829 atoms
Chain time…

Making bonds…

Number of bonds was 3047, now 3047

Generating angles, dihedrals and pairs…
Before cleaning: 7413 pairs
Before cleaning: 8058 dihedrals

Making cmap torsions…

There are 8058 dihedrals, 519 impropers, 5547 angles
7146 pairs, 3047 bonds and 0 virtual sites

Total mass 27414.258 a.m.u.

Total charge -89.000 e

Writing topology

Back Off! I just backed up posre_DNA.itp to ./#posre_DNA.itp.1#

Processing chain 2 (2820 atoms, 89 residues)

Identified residue DA90 as a starting terminus.

Identified residue DC178 as a ending terminus.
8 out of 8 lines of specbond.dat converted successfully
Opening force field file ./amber99sb-ildn-phi-bsc0-cufix.ff/aminoacids.arn
Opening force field file ./amber99sb-ildn-phi-bsc0-cufix.ff/dna.arn
Opening force field file ./amber99sb-ildn-phi-bsc0-cufix.ff/rna.arn

Program: gmx pdb2gmx, version 2023.1-conda_forge
Source file: src/gromacs/gmxpreprocess/pdb2gmx.cpp (line 870)

Fatal error:
Atom P in residue DA 90 was not found in rtp entry DA5 with 30 atoms
while sorting atoms.

The first chain spans residues 1-89, and the second chain spans residues 90-178. To provide context, I used the first residue from the first chain and the last residue from the second chain as references.

ATOM 1 P DG 1 -0.022 9.276 -1.766 1.00 0.00 P
ATOM 2 O1P DG 1 -0.713 10.636 -1.583 1.00 0.00 O
ATOM 2827 1H2’ DT 89 2.237 6.683 -2.592 1.00 0.00 H
ATOM 2828 2H2’ DT 89 0.642 6.884 -2.040 1.00 0.00 H
ATOM 2829 O3’ DT 89 0.160 9.167 -3.501 1.00 0.00 O
ATOM 2830 P DA 90 -4.696 -7.723 -1.995 1.00 0.00 P
ATOM 2831 O5’ DA 90 -5.115 -6.424 -2.658 1.00 0.00 O
ATOM 2832 C5’ DA 90 -5.883 -5.639 -1.744 1.00 0.00 C
ATOM 2833 1H5’ DA 90 -5.241 -5.272 -0.934 1.00 0.00 H
ATOM 5645 H3’ DC 178 -2.550 -8.595 -0.261 1.00 0.00 H
ATOM 5646 C2’ DC 178 -2.398 -6.516 -0.897 1.00 0.00 C
ATOM 5647 1H2’ DC 178 -1.397 -6.801 -1.223 1.00 0.00 H
ATOM 5648 2H2’ DC 178 -2.958 -6.257 -1.772 1.00 0.00 H
ATOM 5649 O3’ DC 178 -4.429 -7.771 -0.343 1.00 0.00 O

I suspect the issue might be related to the presence of ‘END’ and ‘TER’ in the PDB file. It seems challenging to distinguish whether the first and last residues are properly bonded.
Residue 1 is bonded to 89, and residue 90 is bound to 178. How should pdb2gmx understand that it is a closed DNA structure? Additionally, does it not consider residue 1 as the starting terminus?

Any guidance on resolving this issue would be appreciated.

Hi, the standard Amber FFs for nucleic acids do not provide a residue template with a 5’ phosphate. Except for rare cases where it might be relevant to your research problem, you usually have to manually remove the 5’-terminal P and O1P/O2P atoms.

Oh well, that explains why the first chain is processed correctly, which bugged me in the original error message. According to pdb2gmx documentation:

As a special case, ring-closed (or cyclic) molecules are considered. gmx pdb2gmx automatically determines if a cyclic molecule is present by evaluating the distance between the terminal atoms of a given chain. If this distance is greater than the -sb (“Short bond warning distance”, default 0.05 nm) and less than the -lb (“Long bond warning distance”, default 0.25 nm) the molecule is considered to be ring closed and will be processed as such. Please note that this does not detect cyclic bonds over periodic boundaries.

So my guess is that your O3’-P bond for chain A is within these limits, while for chain B this is not the case. If that’s the case, you can move the misaligned atoms (e.g. manually in VMD) and check if that hepls solve the issue.

thank you so much. I will check it

By chance, I could just reproduce the error with my own circular DNA setup, and the error seems to be an issue with gmx pdb2gmx - for some reason, it correctly processes the 1st circular residue, but fails at the next one.

I successfully circumvented this by splitting the PDB into two separate files, processing each one on its own, and recombining the outputs, but it seems like a not-exactly-desired behavior. Wondering if Berk @hess could confirm that?

It is a good idea. I would like to ask about the topology file. If we use pdb2gmx to create the topology file separately, do we have to be concerned about missing some data? I mean, will there be any missing data related to the two base pairs? like Hbonds data between two strands

There is no h-bond data in the topology, it only contains covalent bonds. In general Gromacs topologies are extremely readable, and I always recommend going through the complete file (with all the #include commands parsed, can be done e.g. with the -pp option of gmx grompp, or by saving the topology with gromologist) and understanding the meaning of each consecutive section.

In practical terms, there is a bit of work to combine two .top files into one. The straightforward way is to keep one of them, and remove the headers (all the way to the first [ moleculetype ]) and the footer (unnecessary extra [ moleculetype ]s and [ system ] from the other file. Then rename that other file to sth like other.itp, and add #include "other.itp" to the first .top file where you would add another [ moleculetype ]. Then, edit the [ system ] including one copy of the another molecule’s entry, for example DNA_chain_B 1.

When that has to be automated, for example for many similar systems, there are also gromologist routines for importing molecule definitions between topologies.

I highly appreciate your assistance. I was concerned about the possibility of missing the nonbonded parameters between two strands if I use pdb2gmx to create separate itp files for each strand. To ensure that I understand correctly, if I add the itp file for each strand separately to the top file, I will not miss nonbonded parameters between the two strands, correct?

You’re right, you don’t need to be concerned about this. The non-bonded parameters are charges and LJ sigma/epsilon, and for each interacting pair of atoms they are produced from per-atom or per-atomtype parameters using combination rules.

And contrary to some webservers or selected other engines, by default Gromacs includes all of the parameters for the given force field, so even if one strand was poly-A and another was poly-T, the information is already there. (Plus if something is missing, gmx grompp will usually explicitly complain about it.)

thank you soooo much.