Missing atom types in MD simulations

GROMACS version: 2021.1
GROMACS modification: No

I am running a simple test to get atom types for arginine dipeptide with CHARMM36m force field. The arginine dipeptide should have 12 atom types, but it only shows 11 atom types from the information printed by gmx dump command. I think the HC atom type is not recognized, and it is mistakenly assigned as H atom type. In addition, the energy of the arginine dipeptide is different from other programs. The major difference is the electrostatic energy.

Not only Arg dipeptide, but also some other dipeptides, such as Glu and Lys, have the incorrect number of atom types in Gromacs. Can you tell me why some atom atypes are missing in Gromacs?

HC and H have the same LJ parameters, so they get merged into a single atom type by grompp but the associated bonded parameters are different.

Please provide more detail on the types of tests you are doing. I have rigorously tested the C36m implementation, explicitly using dipeptides, and the energies and forces match between GROMACS and CHARMM to at least the third decimal place (some deviation in electrostatic terms occurs because of the difference in Coulomb’s constant used by each program).

Hi Justin,

Thank you for your reply.

For the tests of dipeptides, the simulations were run without solvent and without long-range PME correction. I did such a simple test becasue I just wanted to get the information of each energy term and the potential energy of a dipeptide at the first MD step. Here are the results between different programs (All energies are in the unit of kcal/mol).

For Arg dipeptide:

Gromacs NAMD Difference
E(bond) 19.54404876 19.5442 0.000151243
E(angle) 20.3956979 20.3958 0.000102103
E(dihedral+cmap) 9.936431644 9.9364 -3.16444E-05
E(improper) 0.243979446 0.244 2.05545E-05
E(vdw) -0.29831979 -0.3228 -0.02448021
E(ele) -251.2767686 -237.4397 13.83706864
E(pot) -201.4549307 -187.6421 13.81283069

For Ala dipeptide:

Gromacs NAMD Difference
E(bond) 10.41295411 10.413 4.58891E-05
E(angle) 16.26589388 16.2658 -9.38815E-05
E(dihedral+cmap) 1.942441683 1.9424 -4.16826E-05
E(improper) 1.222153442 1.2221 -5.34417E-05
E(vdw) -1.378117543 -1.3841 -0.005982457
E(ele) -16.00597514 -16.0059 7.51434E-05
E(pot) 12.45935043 12.4533 -0.00605043

All energy terms are consistent between Gromacs and NAMD for Ala dipeptide, but for Arg dipeptide, the electorstatic energy has large difference. Could you explain the reason for this difference?

I also found two possible errors in the C36m topology file.

  1. For the most recent C36m force field (charmm36-feb2021.ff) on CHARMM force field website, I saw that the atom names of ACE and NME in merged.rtp file were renamed to the same names as in origin top_all36_prot.rtf file, e.g. CH3 → CAY for ACE. However, the name is still CH3 for ACE in the merged.hdb file, which causes errors in gmx pdb2gmx step when adding hydrogens. The earlier version of C36m force field (e.g. charmm36-jul2017.ff) has consistent atom names between merged.rtp and merged.hdb file, which works very well in topology preparation. I think changing the atom names back to the previous version can eliminate the error in gmx pdb2gmx step for the latest version.
  2. In all versions of C36m for Gromacs files, there are two impropers for NME. The first one “N -C CH3 HN” is correct, but I don’t know why the second improper “-C CH3 N -O” is used. Is this an improper angle? Other tool such as psfgen does not generate this improper, so Gromacs will always have one more improper in “topol.top” file if we cap C-terminus with NME. I guess what this improper wants to represent is “-C -CA N -O”, but it could be constructed from “C CA +N O” improper in the topology of each amino acid. Therefore, I think it is necessary to delete this improper in the merged.rtp file.
    Could you please check if what I said is correct or not?

Are you performing a single-point energy evaluation, or actually running 1 step of MD? There can be a big difference. Without your actual inputs, I can’t really comment on the robustness of your approach, but single-point energy and force evaluations in CHARMM (not NAMD) vs. GROMACS yield matching results, as I noted before.

The better solution is just to fix the atom names in the .hdb file, and the same applies for the NME entry. I changed things to be consistent with CHARMM for better interoperability. The corrected entries are:

ACE     1       
3   4   HY   CAY  C    O
NME     2
1   1   HN  N   -C  CAT
3   4   HT  CAT  N  -C

I will update those in the next release of the force field port.

The fix you propose is correct, and I’ll make that change in the next version, too. There should be two impropers there, to match the CHARMM .rtf (patch) entry, which specifies two impropers:

PRES CT3          0.00 ! N-Methylamide C-terminus
GROUP                  ! use in generate statement
ATOM C    C       0.51 !      
ATOM O    O      -0.51 !      |
GROUP                  !      C=O
ATOM NT   NH1    -0.47 !      | 
ATOM HNT  H       0.31 !      NT-HNT
ATOM CAT  CT3    -0.11 !      |
ATOM HT1  HA3     0.09 ! HT1-CAT-HT3
ATOM HT2  HA3     0.09 !      |
ATOM HT3  HA3     0.09 !     HT2 
                       !
BOND C NT  NT HNT  NT CAT  CAT HT1  CAT HT2  CAT HT3
IMPR NT C CAT HNT C CA NT O

I can’t speak to what psfgen is doing, but again using CHARMM as the reference rather than VMD/NAMD tools, the implementations should agree here that there are two impropers pertaining to this patch, one centered on NT and the other on C.

I ran 1 step of MD to test these results. As your suggestion, I performed single-point energy evaluation by gmx mdrun -rerun option, and the results are same for each energy term as I listed above. Here is my Gromacs input for dipeptide test:

integrator = md
nsteps = 1
dt = 0.001
nstxout = 0
nstvout = 0
nstfout = 0
nstenergy = 1
nstlog = 1
nstxout-compressed = 1
compressed-x-grps = System
continuation = no
constraints = none
cutoff-scheme = Verlet
ns_type = grid
nstlist = 10
rcoulomb = 1.2
vdwtype = cutoff
rlist = 1.4
rvdw = 1.2
DispCorr = no
coulombtype = cutoff
tcoupl = V-rescale
tc-grps = Protein
tau_t = 0.1
ref_t = 300
pbc = xyz
gen_vel = yes
gen_temp = 300
gen_seed = -1

I think my input for comparison is okay because I got same energies between Gromacs and NAMD for Ala dipeptide. I did this test because I want the potential energy of a dipeptide to be same in different MD programs, and it should be same if different MD prgrams have the same force field functional forms and read the same parameters. The weird thing is that I can get the matched energy terms for Ala dipeptide but not for Arg dipeptide. From more tests, I found the electrostatic energies are more likely to be same between Gromacs and NAMD for neutral dipeptides, but they are different for charged dipeptides (e.g. Ala, Lys, Glu). Is my

In addition, I also did the same test on LAMMPS, and all energy terms are matched very well between LAMMPS and NAMD for Arg dipeptide. Since I thought you might compare the dipeptides energies between Gromcas and CHARMM long time ago during implementation, I did the same test on older Gromacs versions, such as 2018.2 and 5.1.4, and all versions gave me same energies for Arg dipeptide, which are different from both NAMD and LAMMPs. I didn’t perform the test on CHARMM because I haven’t installed CHARMM and never perform simulations on it before. Could you please do the comparison on Arg dipeptide between Gromcas and CHARMM, or tell me if something is wrong in my test?

Here’s the clue. The large side chain is wrong but the small side chain is right. The programs are using different neighbor lists, I’m willing to bet. This is why all comparisons should be done with infinite cutoffs (all-vs-all comparison). Sadly, this has been disabled in recent GROMACS versions but we have done these comparisons for single molecules, zwitterions, tripeptide, etc. Everything matches between CHARMM and GROMACS. I don’t have a per-energy term breakdown on hand, but I have done it. The overall energies match between CHARMM and GROMACS:

=== Protein ===
AA    CHARMM    GMX
ala   37.31776   37.31739
arg   -172.87158   -172.86615
asp   42.78600   42.78632
asn   67.73579   67.73661
cys   47.28588   47.28537
glu   46.22594   46.22562
gln   8.33029   8.33111
gly   23.93051   23.93021
hsd   63.71331   63.71295
hse   9.40274   9.40317
hsp   116.47855   116.47872
ile   55.21422   55.21391
leu   15697.24964   15697.53824
lys   48.68499   48.68475
met   44.32551   44.32528
phe   69.03321   69.03298
pro   1418.32281   1418.36759
ser   50.78526   50.78465
thr   30.00300   30.00310
trp   80.95839   80.95769
tyr   58.48646   58.48661
val   136.13354   136.13432
 
=== DNA ===
BASE  CHARMM    GMX
ade   212.61300   212.61520
cyt   -74.35659   -74.34918
gua   -184.29789   -184.28847
thy   -28.78346   -28.77652
 
=== RNA ===
BASE  CHARMM    GMX
ade   900.43821   900.43738
cyt   615.83186   615.83652
gua   496.22720   496.23565
ura   588.27669   588.28632
 
=== LIPID ===
LIP   CHARMM    GMX
popc   11.68598   11.68740
pope   0.51746   .51885
pops   -18.83684   -18.83494
popa   -40.73156   -40.72920
dppc   7.75118   7.75239
 
=== CARB ===
SUG   CHARMM    GMX
aglc   125.21478   125.21319
bglc   133.08417   133.08221
agal   123.95685   123.95506
bgal   131.01627   131.01481

I have tested different cutoff schemes with diffrent cutoffs. Even though we cannot do inifite cutoffs, I think we can use some large cutoffs to make sure all non-bonded interactions are included even for the dipeptide with large side chain. Here are the potential energy results for Arg dipeptide.

Epot
Gromacs Verlet 12A -201.4549
NAMD 12A -187.6420
Gromacs Verlet 22A -195.1886
NAMD 22A -187.6420
Gromacs Group 12A -187.6420
In this table, if the cutoff is increased from 12 A to 22 A in Verlet cutoff scheme, the energy is indeed changed, but it is also not close to NAMD energy. In NAMD, the 22 A cutoff provided the same energy as 12 A cutoff, meaning that 12 A is enough for including all non-bonded interaction for Arg dipeptide. Interestingly, I run a test with Group cutoff scheme and 12 A cutoff on Gromacs version 2018.2, the energy are same as NAMD. I don’t know why the Group scheme is removed in current Gromacs version but it seems to have better consistency on energy compared to other MD programs.

Here I also calculated the potential energy on all dipeptides. For Gromacs, the Verlet cutoff scheme and 12 A cutoff are used as before.

Gromacs NAMD Difference (NAMD - Gromacs)
ala 32.1171 32.1111 -0.0060
arg -168.9135 -155.1019 13.8116
asp -14.1699 -0.3446 13.8253
asn -27.3021 -27.3136 -0.0115
cys 69.3255 69.3164 -0.0091
glu 18.8536 32.6762 13.8226
gln 13.1125 13.0976 -0.0149
gly 12.9022 12.8976 -0.0046
hsd 147.7385 147.7160 -0.0225
hse 129.0425 129.0203 -0.0222
hsp 86.6663 100.4869 13.8206
ile 74.2459 74.2317 -0.0142
leu 66.1360 66.1216 -0.0144
lys 80.7010 94.5198 13.8188
met 59.0194 59.0022 -0.0172
phe 63.1496 63.1281 -0.0215
pro 92.9489 92.9404 -0.0085
ser 44.8286 44.8214 -0.0072
thr 69.1549 69.1458 -0.0091
trp 90.7715 90.7412 -0.0303
tyr 53.1451 53.1221 -0.0230
val 63.8470 63.8364 -0.0106
There are several interesting things. First, in this test I used different coordinates for Arg dipeptide, so the energy is different from the previous test for both Gromacs (-168.9135 vs -201.4549) and NAMD (-155.1019 vs -187.6420). However, the energy difference (NAMD - Gromacs) are around 13.81 for the previous coordinates and current coordinates. Seond, from the table we can observe that the differences for all neutral dipeptides are very small, but for charged dipeptides (Arg, Asp, Glu, Hsp, Lys), the differences are quite large and the magnitudes are round 13.8, no matter the dipeptide is positively charged or negatively charged. Third, the size of the side chain is not likely to be the reason for the difference of energy, because some neutral dipeptides also have large side chains as Arg, but their energy differences are small. One more direct evidence is hse/hsd vs hsp. These three dipeptides have very similar size of side chains, except that hsp has 1 more hydrogen than hse/hsd. The energy differences for hse/hsd are good but it is large for hsp.

This test indicates that there might be some systematic differences between Gromacs and NAMD (~13.8 kcal/mol) for calculating the energy of dipeptides with Verlet cutoff scheme. Could you check the codes of electrostatic energy calculation with Verlet scheme, or are there any other settings I need to put into input file for Verlet scheme to get the same energy as Cutoff scheme or other programs such as NAMD?

Do your tests use PME for electrostatics, or a plain cutoff?

I used plain cutoff rather than PME because there is no water in the system, and I just wanted to test the energy of dipeptide itself.

I have no explanation for this systematic difference, but something is bizarre if it is exactly the same value for every charged side chain. Do the energies match if you use infinite cutoffs? This is the only meaningful comparison. There are too many differences between the codes, and frankly the Verlet code is far too complicated for me to try to figure out why GROMACS is different from NAMD to try to dig into it. We have demonstrated that results of MD simulations match among CHARMM, NAMD, and GROMACS, so I am not concerned that there is actually a meaningful differences when it comes to the forces, and the only apples-to-apples comparisons that I have done show perfect matching. I’d be interested to know if the source of the difference can be pinpointed, but it’s not something I have time to dig into.

As you can see in previous results, the Group cutoff scheme has the same energy as NAMD/LAMMPS for Arg dipeptide, which means even inside Gromacs, the energies from the Group scheme and from the Verlet scheme are different. Is it normal that the electrostatic energies from these two cutoff schemes are different only for the charged dipeptides?

In my opinion, the energies for a simple dipeptide should be same no matter what MD programs or cutoff schemes we used, if all bonded and nonbonded interactions are included in the calculation. The weird thing is that the energies of neutral dipeptides with Verlet scheme are same with other programs, but the energies of charged dipeptdies are not, and the differences are all around 13.8 kcal/mol. However, the Group scheme with the same cutoff can provide the same energy for the charged dipeptides. That’s why I doubt that there might be some systematic errors in Verlet scheme when calculating energies for charged dipeptides.

I want the energies are consistent between different MD programs because it is important for library development. If a library is based on CHARMM force field energy, it cannot be generalized to all MD programs if the MM energy is not consistent in them. If you don’t want to dig into the Verlet codes, could you tell me what parameters about Verlet scheme I can set in mdp input file to obtain the same energy as the Group scheme, since the Group scheme is deleted in the current Gromacs version?

I don’t know how to do this. Maybe @hess can comment. I truly hope that the ability to do a proper in vacuo calculation will be restored into GROMACS in the near future. This is a critically important feature that needs to be restored for validating force fields.

Without looking into the details, my guess is that the difference are due to the shift in the Coulomb potential. I expect the difference go away when you set coulomb-modifier = none
You might want to do the same for vdw-modifier

It is indeed convenient to be able to do vacuum simulations, but the same results can be achieved in nearly all cases by using a large periodic box.

Thank you so much, @jalemkul and @hess! By setting coulomb-modifier=none and vdw-modifier=none, the dipeptide energies from Gromacs are same as NAMD/LAMMPS.

I would like to ask one more question. Are there any documents or tutorials for developing library used in Gromacs?

I don’t understand what you mean with “library used in Gromacs”.

I mean the external library, which can read coordinates and other desired variables from Gromacs, perform user-defined calculations, and return the energies and forces back to Gromacs.

That sounds like it might find into the gmx::ForceProvider interface.