Adding a covalent ligand to a force field

My apologies for the previous locked topic, the topic was created in the midst of drafting the question.
GROMACS version: 2020
GROMACS modification: No

My goal is to run a simulation on a protein with a ligand covalent bonded to cysteine using CHARMM36. I’ll be getting reliable parameters for the cys-ligand from a CHARMM expert collaborator so obtaining parameters is not an issue.
From reading the documentation and various forum discussions, my understanding is that I have to do the following:

  • My cys-ligand will be handed over as a .str which I can convert to .itp and .prm.
  • Add this as a new residue to my merged.rtp file in the charmm36.ff, this will take some manual editing but the .rtp is nearly identical to the .itp so this shouldn’t be too difficult. Does this have to be in alphabetical order or can I just stick it on the end?
  • Add any new atom types to atomtypes.dat (I don’t think this applies in my instance)
  • Copy any new bondtypes, angletypes, or dihedraltypes to lines to the corrosponding section in ffbonded.itp
  • Add a line for my residue to residuetypes.dat

Then I should set up my pdb with the new residue named with the corresponding new residue name and run pdb2gmx to generate the system topology.

Is this the right approach for this? Is there anything I am missing?

Sounds about right. You will need an .hdb entry if you need hydrogens constructed. http://manual.gromacs.org/current/how-to/topology.html#adding-a-new-residue

The order of atoms in the .rtp file is irrelevant, but usually listed in some chemically intuitive way.

Thank you Justin,
I will give this a go and try to follow up for help and/or posterity

Hi Justin,

Instead of a .str file for my cys-ligand complex from my collaborators, I have a .psf.
I was able to successfully convert this to a gromacs .top file using TopoGromacs in VMD (for future people reading this, fix the bug commenting out this line))

In my .top file, the bond information is listed as atom index numbers but all the bond information in the residue database are listed as atom names. Is there some way to convert this .top to an .rtp or .itp format to make it easier to add to this custom residue to merged.rtp? Or can index numbers be used in the .rtp instead of atom names?

The .rtp format requires atom names. The conversion should be straightforward - establish a hash (Perl), dictionary (Python), etc. of atom names and numbers, and print out the range of values in the bond list that correspond to your residue.

Thank you Justin, I’ll write up a python script to do the conversion.

Hi Justin,

I wrote a python script for converting formats from top2rtp that worked out very well. Thank you for the suggestion.

I’ve added the atoms and bonds for my new ‘residue’ to merged.rtp and added it to residuetypes.dat. My custom residue is Cys covalently bound to a peptidomimetic inhibitor, thus had atom with identical atom names that required renaming (atom types did not change). This mostly involved adding a number to the end of the atom name, eg. the second N became N2 but the atom type remained NH1. No atom names for the Cys part were changed.

pdb2gmx creates a topology with no errors raised. However, upon closer inspection, I see that no peptide bond gets created between the backbone carbonyl C of my new residue and the amine N of the next residue. A dihedral and cmap gets created though. The distance between the C of my new residue and the N of the next residue is 1.3 angstroms. The peptide bond with the preceding residue gets created with no issue.

How does pdb2gmx know when to generate interesidue backbone bonds?
Why might such an issue occur and how to remedy it?

Pasting the merged.rtp entry for my species here:

[ CLP ]
[ atoms ]
N NH1 -0.470 0
HN H 0.310 1
CA CT1 0.070 2
HA HB1 0.090 3
CB CT2 -0.110 4
HB1 HA2 0.090 5
HB2 HA2 0.090 6
SG S -0.412 7
C C 0.510 8
O O -0.510 9
CAY CT3 -0.270 10
HY1 HA3 0.090 11
HY2 HA3 0.090 12
HY3 HA3 0.090 13
CY C 0.510 14
OY O -0.510 15
N2 NH1 -0.470 16
HN2 H 0.310 17
CA2 CT1 0.070 18
HA2 HB1 0.090 19
CB2 CT2 -0.180 20
HB12 HA2 0.090 21
HB22 HA2 0.090 22
CG2 CT1 -0.090 23
HG2 HA1 0.090 24
CD12 CT3 -0.270 25
H112 HA3 0.090 26
H122 HA3 0.090 27
H132 HA3 0.090 28
CD22 CT3 -0.270 29
H212 HA3 0.090 30
H222 HA3 0.090 31
H232 HA3 0.090 32
C3 C 0.510 33
O3 O -0.510 34
N3 NH1 -0.470 35
HN3 H 0.310 36
CA3 CT1 0.070 37
HA3 HB1 0.090 38
CB3 CT2 -0.180 39
HB13 HA2 0.090 40
HB23 HA2 0.090 41
CG3 CT1 -0.090 42
HG3 HA1 0.090 43
CD13 CT3 -0.270 44
H113 HA3 0.090 45
H123 HA3 0.090 46
H133 HA3 0.090 47
CD23 CT3 -0.270 48
H213 HA3 0.090 49
H223 HA3 0.090 50
H233 HA3 0.090 51
C4 C 0.510 52
O4 O -0.510 53
N5 NH1 -0.470 54
HN5 H 0.310 55
CA5 CT1 0.070 56
HA5 HB1 0.090 57
CB5 CT2 -0.180 58
HB15 HA2 0.090 59
HB25 HA2 0.090 60
CG5 CT2 -0.180 61
HG15 HA2 0.090 62
HG25 HA2 0.090 63
CD5 CT2 0.200 64
HD15 HA2 0.090 65
HD25 HA2 0.090 66
NE5 NC2 -0.700 67
HE5 HC 0.440 68
CZ5 C 0.640 69
NH15 NC2 -0.800 70
H115 HC 0.460 71
H125 HC 0.460 72
NH25 NC2 -0.800 73
H215 HC 0.460 74
H225 HC 0.460 75
C6 CT1 -0.039 76
O6 OC -0.683 77
H6 HA1 0.064 78
[ bonds ]
N HN
N CA
CA HA
CA CB
CA C
CB HB1
CB HB2
CB SG
SG C6
C O
CAY HY1
CAY HY2
CAY HY3
CAY CY
CY N2
CY OY
N2 HN2
N2 CA2
CA2 HA2
CA2 CB2
CA2 C3
CB2 HB12
CB2 HB22
CB2 CG2
CG2 HG2
CG2 CD12
CG2 CD22
CD12 H112
CD12 H122
CD12 H132
CD22 H212
CD22 H222
CD22 H232
C3 N3
C3 O3
N3 HN3
N3 CA3
CA3 HA3
CA3 CB3
CA3 C4
CB3 HB13
CB3 HB23
CB3 CG3
CG3 HG3
CG3 CD13
CG3 CD23
CD13 H113
CD13 H123
CD13 H133
CD23 H213
CD23 H223
CD23 H233
C4 N5
C4 O4
N5 HN5
N5 CA5
CA5 HA5
CA5 CB5
CA5 C6
CB5 HB15
CB5 HB25
CB5 CG5
CG5 HG15
CG5 HG25
CG5 CD5
CD5 HD15
CD5 HD25
CD5 NE5
NE5 HE5
NE5 CZ5
CZ5 NH15
CZ5 NH25
NH15 H115
NH15 H125
NH25 H215
NH25 H225
C6 H6
C6 O6

I clipped this section out of my protein to run the topology generating tests on a smaller peptide. Below is the section of the topology concerning the C (43) atom in question where atom 114 is the N that should be bonded to it.

dkneller@linuxbox$ grep ’ 43’ topol.top
43 C 145 CLP C 43 0.51 12.011
37 43 1
43 44 1
33 43 1
36 43 1
40 43 1
41 43 1
42 43 1
35 37 43 5
38 37 43 5
39 37 43 5
37 43 44 5
33 35 37 43 9
36 35 37 43 9
43 37 39 40 9
43 37 39 41 9
43 37 39 42 9
35 37 43 44 9
38 37 43 44 9
39 37 43 44 9
114 43 116 115 2
43 114 116 119 121 1

Thank you!

EDIT:
I think I might have found my problem.
[ bonds ] requires a ‘+’ or ‘-’ to tell pdb2gmx to connect backbone atoms.
When I added

   C    +N

to my merged.rtp entry for my residue, pdb2gmx was able to create the necessary bond.
I don’t see this detail in the documentation, it might be good idea describe this maybe in the .rtp file formats section.

My follow up question is this: Why does CYS use C +N while the example in the documentation (GLY) use -C N?

Hi,

quick question, since I am trying to do this with another residue. I’m trying to understand the format of the hdb file.

PHE 8
1 1 H N -C CA
1 5 HA CA N CB C
2 6 HB CB CA CG
1 1 HD1 CD1 CG CE1
1 1 HE1 CE1 CD1 CZ
1 1 HZ CZ CE1 CE2
1 1 HE2 CE2 CZ CD2
1 1 HD2 CD2 CG CE2

So that’s how I read it:
[resname] [number of Hs]
[One H] [?] [name of H] [corresponding heavy atom] [bonded atom] [bonded atom]…

What does the second number stand for, i.e. the [?]? Also, is a particular order needed for the [bonded atoms]?

Thanks a lot!
Best wishes,
Jacek

The second number is a function type, which dictates the geometry of the constructed H atoms. The PDF manual has a full description of the functions and how the reference atoms are used.

Hi Justin,

thanks, found it.

Few more quick questions:

  1. I suppose when adding [ dihedraltypes ] to ffbonded.itp, the order of atoms does not make a difference. E.g. a dedral between atoms “c2 cc cc ha” is the same as reversed one “ha cc cc c2”, right?
  2. I see there are two sections [ dihedraltypes ] in ffbonded.itp, one with func 4 and one with func 9. My dihedrals are func 3; should I make a new [ dihedraltypes ] or can I just append it to either of the above. Or should I keep them in the rtp file?
  3. Having added my improper dihedrals to the rtp file, where do I add the associated values? Directly into the rtp (i.e. adding [ dihedral section ]?) or into the ffbonded.itp?

Thanks!

Best,
Jacek

Ok, so I figured out my issue with the improper dihedrals.

However, I’m still not sure I’m understanding correctly how to deal with my proper dihedrals. Using antechamber I got dihedrals of type 3 (Ryckaert-Bellemans), but it seems that in ffbonded.itp or aminoacids.rtp only type 9 dihedrals (multiple proper dihedrals) are used. I suppose I have to translate my dihedrals to type 9 (and use the #define_torsion… to use multiples) or is there another way to deal with that?

I’d appreciate any advice.

Best,
Jacek

You shouldn’t be parametrizing your species using antechamber if you’re using the CHARMM force field. The conventions used by AMBER and CHARMM are quite different, and indeed you can’t directly use any R-B dihedrals. You need to use proper dihedrals; the only somewhat unique thing about CHARMM is that it allows more than one term per torsion, unlike some other force fields, hence the difference in function type. I have no idea what #define_torsion is, sorry.

Hi Justin,

thanks for your response.
I should have mentioned that I am trying to do this for AMBER. Sorry for the missing info.

Ok, then I will try to add some lines to acpype.py to obtaining proper dihedrals - if I’m not mistaken, antechamber provides proper dihedrals which are then translated by acpype.

Thanks for the help, I think I’m almost there.

Best,
Jacek

BTW: #define_torsion_… lines can be found in amber99SBildn for specific amino acid residues, which (I think) overwrite the defaults for certain atomtypes if they don’t match those defaults.

OK, then ignore what I said above about CHARMM. You were posting on a thread about a CHARMM topology so I assumed it was a related issue. Better to start a new thread if it’s a new/unrelated problem.

Ok, will do so next time. Thx!