Inserting a modified residue into a protein/DNA chain using pdb2gmx

GROMACS version: 2021.5
GROMACS modification: No
Hi all,

I am attempting to covalently attach a custom small molecule to a DNA strand I want to simulate in a water box. I will be using the CHARMM36 forcefield, so I parameterised the small molecule using CGENFF, and generated the usual .itp and .prm files. In prior ligand-protein simulations, I would simply add in this new molecule into my topology and make sure the itp/prm files are included, but this time I want this molecule to be covalently bound to my DNA chain.

I am a bit confused on how to get this to happen - should I simply add in the small molecule in my topology as usual? If I do this, will the covalent bond with the DNA molecule be observed? A similar question seemed to imply that this would not work unless using the intermolecular_interactions field, but then the guidance document says this should not be used for covalent binding (Covalent bond between protein and ligand (cobalamin)).

The other option would be to add this small molecule to my forcefield prior to pdb2gmx. If I were to follow down this route, how would I transfer my itp file so that pdb2gmx can read it? It does not seem simple to use my itp file to edit the different rtp files in the forcefield.

Many thanks in advance!

Hi,

I have a feature for exactly this kind of use cases implemented in gromologist, documented here. If you have the topology of the DNA and the .itp of the ligand, it should be trivial to load both separately, create the bond between them (and perhaps remove superfluous atoms), and then use add_parameters_from_file() to read in the extra params. Afterwards, you can run find_missing_ff_params() to check if you have all the necessary parameters.

Let me know if that helps, or should you run into errors!

Hi Miłosz,

Thank so much for your reply. I did not know about gromologist, it seems like a great tool. I have now conducted an extensive online search, tried a great number of things and I think I have advanced a few steps, but I am still experiencing a final issue. Here’s a brief overview:

  • I initially attempted to use gromologist to make the bond between the small molecule and 5’ DNA nucleotide (a bond between the 5’ phosphate and an oxygen atom on my ligand), but the bond was being created with an extra hydrogen (the oxygen had 3 covalent bonds) so something was going wrong.
  • I went back to basics and tried to see what was going wrong. Lots of searches later, I managed to create my .rtp files by using charmm2gmx (Welcome to charmm2gmx’s documentation! — charmm2gmx 1.0.2 documentation) to generate a new forcefield library with just my ligand from my CGenFF .str file, then copying in the details into my current charmm36 (July 2021) forcefield (including the ffbonded.itp angles/dihedrals). I made sure to add an extra bond to the rtp file showing how the terminal oxygen would bind with a phosphate (P+) on a subsequent DNA molecule.
  • At this point, I realised that the previous issue with the bond creation was because I had left a terminal hydrogen on my small molecule, which wasn’t getting removed when I combined it with my DNA. After removing it and reparametrising with CGenFF, I was able to generate the correct topology using pdb2gmx (I also added my small molecule to residuetypes.dat as DNA). I guess this brings me to the same stage as above if I had used gromologist to make the bond.
  • However, I then tested the topology using grompp but this complained that the parameters for a single bond are missing, along with the associated angles and dihedrals (same result when using gromologist’s find_missing_ff_params()). I am assuming that these missing parameters are for the bond between the oxygen (on my ligand) and phosphate (on the DNA), which I did not parameterise with CGenFF.
  • I managed to find another post (where you also helped out someone else - Simulation of novel terminal DNA residues with Charmm36 - #3 by JonahC) which seems to discuss a similar issue. Your advice was to: ‘Both will complain about each bonded term that is not defined, so that you can include them by analogy and modify [ bondtypes ], [ angletypes ] etc accordingly.’
  • Would you be able to elaborate on the above? I am not sure how I could guess the dihedrals/angles for this bond, or were you implying that they should be copied from somewhere else?
  • There is also another issue - the charge of the DNA chain appears to be a non-integer. I think this is as one capping hydrogen is present (which as added by pdb2gmx via 3TER) while the other is not (as it has been replaced by the small molecule ligand). Is this a concern? If so, how would you fix this?

Many thanks in advance - I really appreciate your help here!

Matthew

P.S. Initially, I attempted to use gromologist’s tp_rtp() function to generate my rtp files but this does not seem to be defined in the package I downloaded from pip. Is this expected?

Hi Matthew,

Indeed, the idea of providing basic features is that the user should take care of the “chemical” aspects of the operations - if you want to form an adduct, you should also decide which existing atoms should be removed. The main thing done by merge_two is the creation of all required bonded entries, and the merging/renumbering of the two moleculetype sections. It’s true, going through the .rtp entry will in principle give you a very similar result.

As you note, though, parametrizing the ligand itself does not automatically take care of the connections. There is a good chance the bond (and angles, dihedrals, …) connecting the two entities will not have the parameters present in the force field, as biochemical FFs tend to only parametrize interactions that are present in the supported residues. Same with the charges - if you have two separate entities with integer charges, and then you need to remove a hydrogen or two, charges will be redistributed.

One way to work around this is to define your small molecule in such a way that it partially overlaps with the standard DNA molecule, and then use some sort of averaging for the overlapping region to make the whole thing neutral; here, all the missing bonded parameters will be provided, so it’s only a matter of transfering them. Another way is to use your chemical intuition to decide how to assign parameters by analogy, and how to redistribute the extra charge (especially if it’s small) in a reasonable way. Here again you can use non-terminal residues to see what the phosphate charges should be. I would say it’s a case-by-case question to figure out which works better, depending on how mundane the linker chemistry is.

For the practical aspects of assigning parameters by analogy, the find_missing_ff_params function has an optional fix_by_analogy argument (still missing in the documentation) that can accept a dictionary of analogous types, e.g. when passing {'c': 'CT', 'n': 'NA'} it will look for existing parameters assuming that types c and n are equivalent to types CT and NA, respectively. It might also help to check atomtypes.atp for FF-specific descriptions of which atomtype corresponds to which chemical entity.

All the best,
Milosz

Hi Miłosz,

Thanks again for your reply and help here - I’ll investigate what I can do and report back.

Re. the fix_by_analogy keyword in gromologist - does this simply print out the values that should be inserted into the forcefield’s ffbonded.itp file or does it directly reach out and edit the file itself?

Matthew

Hi Matthew,

If the parameters are found, they will be put in the topology, into the [ bonds ], [ angles ] etc. of the molecule itself.

The to_rtp() function is a bit experimental, and not yet included in the last pip release; e.g. it should include all the impropers but is not guaranteed to cover additional dihedrals. If you want to use it, install the version from git, and proceed with caution.

Best,
Miłosz

Hi Miłosz,

Just wanted to give a brief update - I eventually managed to fix most of my issues by re-dispersing the extra charge in my DNA nucleotide (following the implementation here: Choice of fluorophore affects dynamic DNA nanostructures | Nucleic Acids Research | Oxford Academic) and then added in missing angle/dihedral parameters by using CGenFF to parametrise the DNA-ligand linker and extracting the exact numbers I needed from there (I needed to tick the ‘Include parameters that are already in CGenFF’ box to get everything I needed). There were a couple others that I also had to extract from the forcefields used in: https://pubs.acs.org/doi/epdf/10.1021/acsnano.2c06936.

Thanks again for your help here and for sharing gromologist - it’s very useful for diagnosing issues with parametrisation!

Hi Matthew, happy to hear you made it work, and thanks a lot for the feedback!

Hi Matthew,
I’m trying to simulate a similar system, you said you were facing an issue with the bond creation because you left a terminal hydrogen on your small molecule which wasn’t getting removed when you combined it with DNA. You removed it and reparametrised with CGenFF and you could generate the correct topology using pdb2gmx.
But when I try to upload my mol2 file after removing the terminal hydrogen, the server gives me an error saying “resonance warning: unfulfilled valence in aromatic subgraph; skipped molecule”.
Do you have any idea how to fix it?

Hi @tm1089, I think I’d need more details to decipher what’s going on here - how did you remove the hydrogen from your molecule? What molecule is it? Are you also trying to simulate a DNA-ligand conjugated system?

Hi Mathews,
I am also trying to attach a cholesterol covalently to DNA. In your previous messages, I found that you used charmm2gmx to generate the rtp file. Could you please tell me how you did it as the input file (.in file) does not contain any information for generating the output file.

Thank you in advance

Hi @mnabanita, you’ll need to generate the charmm2gmx.in file yourself. This is how you would set it up:

toppardir: //directory containing ligand .str file
outputdir: output.ff // output forcefield folder name
bibtexfile: citations.bib // has no effect if empty file provided

[ligands] // name of .rtp file generated
x.str // path to ligand .str file here

Just fill in the blanks with your own filepaths. You’ll need to pass an empty citations.bib file even if you don’t use it.

Then finally, you run the conversion command as follows:

charmm2gmx convert charmm2gmx.in

and that should be it!

@mattaq Thank you so much for sharing the script. I have a small query regarding the .rtp file which needs to be provided in the script. I would like to know about how did you generate it for the ligand, since from CGENFF we only get the .itp and the .prm file.

Thanks in advance

That’s just a comment I added to show that the rtp file generated will be called ligands.rtp (or whatever you name that section). All the script needs for input is your ligand .str file you get from CGENFF!

@mattaq Thank you for sharing the script.
But I am facing another error
I am trying to make a gro file of ds-DNA attched with modified cholesterol. i have generated the parameters for both. Although the gro file is getting generated, it is forming long and short bonds and changing the structure. It is also saying some residues are missing but those residues are present in .pdb file.
Can anyone help me out in solving the error.
Command line:
gmx pdb2gmx -f dna_chl.gro -o dna_chl.gro -ter -missing

Thanks in advance

Hi Matthew,
Thanks for your reply, I’m trying to simulate a DNA system which (at the C5 position of T) is covalently attached to the ligand trifluorotoluene, I manually edited the pdb file of my ligand to remove the terminal hydrogen from the carbon (that covalently binds to the thymine of DNA) and tried to upload the mol2 file of my lignad on CGenFF server, but I got the error “resonance warning: unfulfilled valence in aromatic subgraph; skipped molecule”.
Please could you elaborate on how you removed the terminal hydrogen of your ligand and reparameterize it using cgenff server?
Also, how did you merge it with your DNA?
Did you make changes in your forcefield directory or did you simply add in this new molecule into topology and include the itp/prm files?

Thanks.

@mnabanita unfortunately I won’t be able to help with your system without an in-depth understanding of what’s going on. Some general advice that I can give:

  • Using -missing is probably not a good idea as there might be something else wrong that you will need to fix separately instead of using that option.
  • A possible way of understanding what’s going wrong would be first to try and prepare a .gro file of the dsDNA alone, then move on to the dsDNA cholesterol after everything is resolved first.
  • For chain terminators, the normal 5’/3’ DNA ends should be terminated with 5TER/3TER as appropriate while the end with cholesterol should be set to none.
  • Non-integer charges will need to be distributed manually as described earlier in this thread.
  • Long/broken bonds in a visualisation package such as PyMOL do not necessarily match with what the system actually looks like (especially with PDB files), you might need to look into the actual file directly or transform the file in some way.

@tm1089:

  • I modified my DNA using PyMOL i.e. I selected specific atoms, selected the blue Action (A) button and clicked remove atoms. After you’re done, you can then select your entire molecule and save as a new .mol2 file (export molecule). Perhaps this takes care of properly removing all bonds and other metadata in the file and you could have missed these extra steps? I actually did not have to follow this atom deletion process in the end (see below).
  • I merged the molecule with my DNA in a similar manner using PyMOL. However, to join two molecules together you’ll need to first load both molecules into PyMOL, export them together as one PDB, then re-load them in and PyMOL will treat them as a single object you can manipulate and bind together. You could of course use other tools such as pdb4amber too.
  • If you are simulating a DNA chain bound to a new molecule, I do not think you can simply include the itp/prm files and run. Instead, I followed the (quite involved) process I was detailing in this thread, i.e.:
    • First run your ligand in CGENFF as is, with no DNA connection.
    • Use charmm2gmx (instructions detailed above) to convert the .str file into rtp/itp files and add the new parameters to your forcefield.
    • Attach the ligand to your DNA chain and fully validate the naming of your atoms, and that all necessary atoms are in place.
    • Add your new ligand name to a residuetypes.dat file, classify it as DNA.
    • At this point you might be lucky and all your parameters will be in place already. If so, simply run pdb2gmx (make sure to terminate the ligand with none and the other ends of the DNA chain as 5TER/3TER) and re-disperse charge if necessary.
    • If there still are some parameters missing (GROMACS might only warn you about these later on when you first try to simulate rather than at the pdb2gmx stage), you will need to find out how to add these to your forcefield manually. As I had detailed earlier on, I managed to do this by A) re-running the ligand attach to a single nucleotide in CGENFF, rerunning charmm2gmx and extracting the exact new parameters I needed and B) by finding the parameters I needed from the literature.

Hope the above helps! I know it’s all a bit confusing and convoluted, but I haven’t been able to find an easier way to achieve a fully-functioning DNA-ligand covalent link in GROMACS…

I am facing another new error of O3’ not included in the input. Although in the .rtp file, O3’ bond with C41 is present. O3’ of Thymine is making a covalent bond with C41 of 3CHL.

gmx pdb2gmx -f A00.pdb -o new1.gro -ter -ignh
:-) GROMACS - gmx pdb2gmx, 2023.3 (-:

Executable: /usr/local/gromacs/bin/gmx
Data prefix: /usr/local/gromacs
Working dir: /home/nabanita/projects/DNA_chol/10feb
Command line:
gmx pdb2gmx -f A00.pdb -o new1.gro -ter -ignh

Select the Force Field:

From current directory:

1: CHARMM all-atom force field

2: CHARMM all-atom force field

From ‘/usr/local/gromacs/share/gromacs/top’:

3: AMBER03 protein, nucleic AMBER94 (Duan et al., J. Comp. Chem. 24, 1999-2012, 2003)

4: AMBER94 force field (Cornell et al., JACS 117, 5179-5197, 1995)

5: AMBER96 protein, nucleic AMBER94 (Kollman et al., Acc. Chem. Res. 29, 461-469, 1996)

6: AMBER99 protein, nucleic AMBER94 (Wang et al., J. Comp. Chem. 21, 1049-1074, 2000)

7: AMBER99SB protein, nucleic AMBER94 (Hornak et al., Proteins 65, 712-725, 2006)

8: AMBER99SB-ILDN protein, nucleic AMBER94 (Lindorff-Larsen et al., Proteins 78, 1950-58, 2010)

9: AMBERGS force field (Garcia & Sanbonmatsu, PNAS 99, 2782-2787, 2002)

10: CHARMM27 all-atom force field (CHARM22 plus CMAP for proteins)

11: GROMOS96 43a1 force field

12: GROMOS96 43a2 force field (improved alkane dihedrals)

13: GROMOS96 45a3 force field (Schuler JCC 2001 22 1205)

14: GROMOS96 53a5 force field (JCC 2004 vol 25 pag 1656)

15: GROMOS96 53a6 force field (JCC 2004 vol 25 pag 1656)

16: GROMOS96 54a7 force field (Eur. Biophys. J. (2011), 40, 843-856, DOI: 10.1007/s00249-011-0700-9)

17: OPLS-AA/L all-atom force field (2001 aminoacid dihedrals)
1

Using the Charmm36-jul2022 force field in directory ./charmm36-jul2022.ff
Opening force field file ./charmm36-jul2022.ff/watermodels.dat

Select the Water Model:

1: TIP3P CHARMM-modified TIP3P water model (recommended over original TIP3P)

2: TIP3P_ORIGINAL Original TIP3P water model

3: SPC SPC water model

4: SPCE SPC/E water model

5: TIP5P TIP5P water model

6: TIP4P TIP4P water model

7: TIP4PEW TIP4P/Ew water model

8: None
1

going to rename ./charmm36-jul2022.ff/aminoacids.r2b
Opening force field file ./charmm36-jul2022.ff/aminoacids.r2b

going to rename ./charmm36-jul2022.ff/carb.r2b
Opening force field file ./charmm36-jul2022.ff/carb.r2b

going to rename ./charmm36-jul2022.ff/cgenff.r2b
Opening force field file ./charmm36-jul2022.ff/cgenff.r2b

going to rename ./charmm36-jul2022.ff/ethers.r2b
Opening force field file ./charmm36-jul2022.ff/ethers.r2b

going to rename ./charmm36-jul2022.ff/lipid.r2b
Opening force field file ./charmm36-jul2022.ff/lipid.r2b

going to rename ./charmm36-jul2022.ff/metals.r2b
Opening force field file ./charmm36-jul2022.ff/metals.r2b

going to rename ./charmm36-jul2022.ff/na.r2b
Opening force field file ./charmm36-jul2022.ff/na.r2b

going to rename ./charmm36-jul2022.ff/silicates.r2b
Opening force field file ./charmm36-jul2022.ff/silicates.r2b

going to rename ./charmm36-jul2022.ff/solvent.r2b
Opening force field file ./charmm36-jul2022.ff/solvent.r2b
Warning: Residue ‘ADE’ already present with type ‘DNA’ in database, ignoring new type ‘RNA’.
Warning: Residue ‘ADE’ already present with type ‘DNA’ in database, ignoring new type ‘RNA’.
Warning: Residue ‘ADE’ already present with type ‘DNA’ in database, ignoring new type ‘RNA’.
Warning: Residue ‘ADE’ already present with type ‘DNA’ in database, ignoring new type ‘RNA’.
Warning: Residue ‘CYT’ already present with type ‘DNA’ in database, ignoring new type ‘RNA’.
Warning: Residue ‘CYT’ already present with type ‘DNA’ in database, ignoring new type ‘RNA’.
Warning: Residue ‘CYT’ already present with type ‘DNA’ in database, ignoring new type ‘RNA’.
Warning: Residue ‘CYT’ already present with type ‘DNA’ in database, ignoring new type ‘RNA’.
Warning: Residue ‘GUA’ already present with type ‘DNA’ in database, ignoring new type ‘RNA’.
Warning: Residue ‘GUA’ already present with type ‘DNA’ in database, ignoring new type ‘RNA’.
Warning: Residue ‘GUA’ already present with type ‘DNA’ in database, ignoring new type ‘RNA’.
Warning: Residue ‘GUA’ already present with type ‘DNA’ in database, ignoring new type ‘RNA’.
Reading A00.pdb…
Read ‘’, 291 atoms

Analyzing pdb file
Splitting chemical chains based on TER records or chain id changing.

There are 1 chains and 0 blocks of water and 13 residues with 291 atoms

chain #res #atoms

1 ‘D’ 13 291

All occupancies are one
All occupancies are one
Opening force field file ./charmm36-jul2022.ff/atomtypes.atp

Reading residue database… (Charmm36-jul2022)
Opening force field file ./charmm36-jul2022.ff/aminoacids.rtp
Opening force field file ./charmm36-jul2022.ff/carb.rtp
Opening force field file ./charmm36-jul2022.ff/cgenff.rtp
Opening force field file ./charmm36-jul2022.ff/ethers.rtp
Opening force field file ./charmm36-jul2022.ff/lipid.rtp
Opening force field file ./charmm36-jul2022.ff/metals.rtp
Opening force field file ./charmm36-jul2022.ff/na.rtp
Opening force field file ./charmm36-jul2022.ff/silicates.rtp
Opening force field file ./charmm36-jul2022.ff/solvent.rtp
Opening force field file ./charmm36-jul2022.ff/3chl.hdb
Opening force field file ./charmm36-jul2022.ff/aminoacids.hdb
Opening force field file ./charmm36-jul2022.ff/carb.hdb
Opening force field file ./charmm36-jul2022.ff/cgenff.hdb
Opening force field file ./charmm36-jul2022.ff/ethers.hdb
Opening force field file ./charmm36-jul2022.ff/lipid.hdb
Opening force field file ./charmm36-jul2022.ff/metals.hdb
Opening force field file ./charmm36-jul2022.ff/na.hdb
Opening force field file ./charmm36-jul2022.ff/silicates.hdb
Opening force field file ./charmm36-jul2022.ff/solvent.hdb
Opening force field file ./charmm36-jul2022.ff/3chl.n.tdb
Opening force field file ./charmm36-jul2022.ff/aminoacids.n.tdb
Opening force field file ./charmm36-jul2022.ff/carb.n.tdb
Opening force field file ./charmm36-jul2022.ff/cgenff.n.tdb
Opening force field file ./charmm36-jul2022.ff/ethers.n.tdb
Opening force field file ./charmm36-jul2022.ff/lipid.n.tdb
Opening force field file ./charmm36-jul2022.ff/metals.n.tdb
Opening force field file ./charmm36-jul2022.ff/na.n.tdb
Opening force field file ./charmm36-jul2022.ff/silicates.n.tdb
Opening force field file ./charmm36-jul2022.ff/solvent.n.tdb
Opening force field file ./charmm36-jul2022.ff/3chl.c.tdb
Opening force field file ./charmm36-jul2022.ff/aminoacids.c.tdb
Opening force field file ./charmm36-jul2022.ff/carb.c.tdb
Opening force field file ./charmm36-jul2022.ff/cgenff.c.tdb
Opening force field file ./charmm36-jul2022.ff/ethers.c.tdb
Opening force field file ./charmm36-jul2022.ff/lipid.c.tdb
Opening force field file ./charmm36-jul2022.ff/metals.c.tdb
Opening force field file ./charmm36-jul2022.ff/na.c.tdb
Opening force field file ./charmm36-jul2022.ff/silicates.c.tdb
Opening force field file ./charmm36-jul2022.ff/solvent.c.tdb

Back Off! I just backed up topol.top to ./#topol.top.47#

Processing chain 1 ‘D’ (291 atoms, 13 residues)

Identified residue DA51 as a starting terminus.

Identified residue 3CHL13 as a ending terminus.
8 out of 8 lines of specbond.dat converted successfully
Select start terminus type for DA5-1
0: 3TER
1: NH3+
2: NH2
3: HYD1
4: MET1
5: 5TER
6: 5MET
7: 5PHO
8: 5POM
9: None
5
Start terminus DA5-1: 5TER
Select end terminus type for 3CHL-13
0: COO-
1: COOH
2: CT2
3: CT1
4: HYD2
5: MET2
6: 3TER
7: None
6
End terminus 3CHL-13: 3TER
Opening force field file ./charmm36-jul2022.ff/aminoacids.arn

Checking for duplicate atoms…

Generating any missing hydrogen atoms and/or adding termini.


Program: gmx pdb2gmx, version 2023.3
Source file: src/gromacs/gmxpreprocess/pgutil.cpp (line 154)

Fatal error:
Residue 13 named 3CHL of a molecule in the input file was mapped
to an entry in the topology database, but the atom O3 used in
that entry is not found in the input file. Perhaps your atom
and/or residue naming needs to be fixed.

Thank you so much for your reply, Matthew! I’ll try it out and let you know how it goes.