Pdb2gmx cannot recognize HT3 or HT1 atoms occasionally?

GROMACS version: 2021.2
GROMACS modification: Yes, installed via brew on Mac OS, also tried the ubuntu version 2020.1-1, the same problem.

Hi GROMACS users and developers,

I am using pdb2gmx to prepare the input. Forcefield is the Charmm36 [charmm36-feb2021.ff] downloaded from mackerell.umaryland.edu

I have developed my personal forcefield files of rtp and n.tdb. I also changed the residuetypes.dat and atomtypes.atp accordingly, i.e., added NTE as protein. Here are the rtp and n.tdb files:

; peptoids.rtp
[ bondedtypes ]
1 5 9 2 1 3 1 0

[ NTE ]
; | HA3 HB1 HD1 HE1 HH1 HT1 HK1
; | | | | | | | /
; N --CA5–CB–OG–CD–CE–OZ–CH–CT–OI–CK–HK2
; | | | | | | | \
; | HA4 HB2 HD2 HE2 HH2 HT2 HK3
; HA1-CA-HA2
; |
; |
; O=C
; |
;
[ atoms ]
N NTOID -0.354 0
CA CG321 -0.008 0
HA1 HGA2 0.09 0
HA2 HGA2 0.09 0
CA5 CG321 0.002 1
HA3 HGA2 0.09 1
HA4 HGA2 0.09 1
CB CG321 -0.01 2
HB1 HGA2 0.09 2
HB2 HGA2 0.09 2
OG OG301 -0.34 3
CD CG321 -0.01 4
HD1 HGA2 0.09 4
HD2 HGA2 0.09 4
CE CG321 -0.01 5
HE1 HGA2 0.09 5
HE2 HGA2 0.09 5
OZ OG301 -0.34 6
CH CG321 -0.01 7
HH1 HGA2 0.09 7
HH2 HGA2 0.09 7
CT CG321 -0.01 8
HT1 HGA2 0.09 8 ;;; problems here
HT2 HGA2 0.09 8 ;;;
OI OG301 -0.34 9
CK CG331 -0.1 10
HK1 HGA3 0.09 10
HK2 HGA3 0.09 10
HK3 HGA3 0.09 10
C CG2O1 0.53 11
O OG2D1 -0.53 11
[ bonds ]
N CA5
N CA
C +N
CA C
CA HA1
CA HA2
CA5 CB
CB OG
CB HB1
CB HB2
CA5 HA3
CA5 HA4
OG CD
CD CE
CE OZ
CD HD1
CD HD2
CE HE1
CE HE2
OZ CH
CH CT
CT OI
OI CK
CH HH1
CH HH2
CT HT5
CT HT4
CK HK1
CK HK2
CK HK3
C O
[ impropers ]
;N -C CA CA5
;C CA +N O
C CA +N O

; peptoids.n.tdb
[ None ]

[ NHTOID ]
; [HT1] => add H to the N terminus
; |
; N—CA5—CxHyOz
; |
;HA2-CA-HA1
; |
; |
[ replace ]
CA CA CG321 12.011 -0.110
[ add ]
1 2 HT1 N CA CA5 ;
HGP1 1.008 0.102 -1 ; charge 0.102

Here is my pdb:
ATOM 1 N NTE U 1 -0.391 13.665 -0.310 1.00 0.00 N
ATOM 3 CA NTE U 1 0.196 12.814 0.720 1.00 0.00 C
ATOM 4 HA1 NTE U 1 0.622 13.375 1.430 1.00 0.00 H
ATOM 5 HA2 NTE U 1 -0.509 12.241 1.137 1.00 0.00 H
ATOM 6 CA5 NTE U 1 -1.487 13.033 -1.082 1.00 0.00 C
ATOM 7 HA3 NTE U 1 -1.681 13.565 -1.906 1.00 0.00 H
ATOM 8 HA4 NTE U 1 -1.232 12.103 -1.347 1.00 0.00 H
ATOM 9 CB NTE U 1 -2.755 13.004 -0.139 1.00 0.00 C
ATOM 10 HB1 NTE U 1 -2.559 12.415 0.645 1.00 0.00 H
ATOM 11 HB2 NTE U 1 -2.933 13.932 0.188 1.00 0.00 H
ATOM 12 OG NTE U 1 -3.931 12.515 -0.800 1.00 0.00 O
ATOM 13 CD NTE U 1 -5.120 12.945 -0.012 1.00 0.00 C
ATOM 14 HD1 NTE U 1 -5.084 12.510 0.888 1.00 0.00 H
ATOM 15 HD2 NTE U 1 -5.087 13.938 0.104 1.00 0.00 H
ATOM 16 CE NTE U 1 -6.485 12.560 -0.708 1.00 0.00 C
ATOM 17 HE1 NTE U 1 -6.518 12.974 -1.618 1.00 0.00 H
ATOM 18 HE2 NTE U 1 -6.537 11.566 -0.801 1.00 0.00 H
ATOM 19 OZ NTE U 1 -7.624 13.028 0.072 1.00 0.00 O
ATOM 20 CH NTE U 1 -8.832 12.565 -0.575 1.00 0.00 C
ATOM 21 HH1 NTE U 1 -8.862 12.932 -1.505 1.00 0.00 H
ATOM 22 HH2 NTE U 1 -8.812 11.566 -0.620 1.00 0.00 H
ATOM 23 CT NTE U 1 -10.120 13.011 0.186 1.00 0.00 C
ATOM 24 HT1 NTE U 1 -10.093 12.674 1.127 1.00 0.00 H
ATOM 25 HT2 NTE U 1 -10.179 14.009 0.199 1.00 0.00 H
ATOM 26 OI NTE U 1 -11.256 12.463 -0.496 1.00 0.00 O
ATOM 27 CK NTE U 1 -12.456 12.916 0.153 1.00 0.00 C
ATOM 28 HK1 NTE U 1 -12.394 13.903 0.300 1.00 0.00 H
ATOM 29 HK2 NTE U 1 -12.549 12.446 1.031 1.00 0.00 H
ATOM 30 HK3 NTE U 1 -13.241 12.707 -0.431 1.00 0.00 H
ATOM 31 C NTE U 1 1.231 11.955 0.048 1.00 0.00 C
ATOM 32 O NTE U 1 2.059 12.563 -0.650 1.00 0.00 O
TER
END

After running gmx pdb2gmx with the flags -ter and choosing to cap with NHTOID for N terminus and CT2 for C terminus, it gives the error:
“Fatal error: Atom H1 in residue NTE 1 was not found in rtp entry NTE with 35 atoms while sorting atoms. For a hydrogen, this can be a different protonation state, or it might have had a different number in the PDB file and was rebuilt (it might for instance have been H3, and we only expected H1 & H2).
Note that hydrogens might have been added to the entry for the N-terminus. Remove this hydrogen or choose a different protonation state to solve it.
Option -ignh will ignore all hydrogens in the input.”

So it recognized HT1 as H1. Then I changed he HT1 and HT2 in NTE into HT3 and HT4. pdb2gmx returned the error that “Atom H3 in residue NTE 1 was not found”. Finally, I changed the two names into HT5 and HT4, it worked!

Then, I tried a small molecule: Dimethylformamide, of which the pdb is:
HEADER Ideal coordinates for PDB-CCD DMF
COMPND DMF
AUTHOR pdbccdutils 0.6
AUTHOR RDKit 2021.03.1
HETATM 1 CC DMF A 1 -0.721 0.000 1.696 1.00 20.00 C
HETATM 2 CT DMF A 1 1.461 0.000 0.402 1.00 20.00 C
HETATM 3 C DMF A 1 -0.691 0.000 -0.739 1.00 20.00 C
HETATM 4 O DMF A 1 -0.097 0.000 -1.797 1.00 20.00 O
HETATM 5 N DMF A 1 -0.003 0.000 0.419 1.00 20.00 N
HETATM 6 HC1 DMF A 1 -1.347 -0.890 1.761 1.00 20.00 H
HETATM 7 HC2 DMF A 1 -0.003 0.000 2.516 1.00 20.00 H
HETATM 8 HC3 DMF A 1 -1.346 0.890 1.761 1.00 20.00 H
HETATM 9 HT3 DMF A 1 1.812 0.000 -0.629 1.00 20.00 H
HETATM 10 HT4 DMF A 1 1.830 0.890 0.911 1.00 20.00 H
HETATM 11 HT5 DMF A 1 1.830 -0.890 0.911 1.00 20.00 H
HETATM 12 HA DMF A 1 -1.771 0.000 -0.727 1.00 20.00 H
END

Running with the original merged.rtp from the default charmm36-2021.2, no problem was presented, either using the atom names HT3 HT4 HT5 or using HT1 HT2 HT3 for 9-11 atoms.

Please help! It is really confusing that I have no clue. Appreciate your earliest help.

Xubo Luo

I suspect that HT1 HT2 and HT3 are reserved for the terminus capping, because I have found these in the tdb file:

; merged.c.tdb
[ CT2 ]
[ replace ]
C CC 12.011 0.55
O O 15.9994 -0.55
[ add ]
1 2 NT C CA N
NH2 14.0027 -0.62 -1
2 3 HT NT C CA
H 1.008 0.00 -1
[ replace ]
HT1 H 1.008 0.30 ; this one is trans to O
HT2 H 1.008 0.32 ; this one is cis to O
[ impropers ]
C NT CA O
C CA NT O
NT C HT1 HT2
NT C HT2 HT1

However, it is interesting that when I did DMF as mentioned before, it did not trigger any error.

This comes from xlateat.dat, which renames common atom names that may not be compliant, so in the case of an N-terminal residue, HT[123] become H[123]:

protein-nterm  HT1    H1
protein-nterm  HT2    H2
protein-nterm  HT3    H3

This will not affect all molecules, based on how xlateat.dat specifies the fact that it’s only for protein residues, and of those, only the N-terminal residue.