All lipids defined as part of one molecule in topology

GROMACS version: 2023
GROMACS modification: No

After running pdb2gmx on a pdb input file containing a lipid bilayer with waters above and below, I get a topology where all lipid molecules are treated as amino acid residues in a polypeptide chain – part of a single molecule. So, instead of 128 separate lipid molecules I get one molecule with 128 residues. Am I doing something wrong, or is this really how lipids are treated in GROMACS topologies? Naively, I expected treatment similar to solvent, with multiple individual molecules belonging to one chain.

I would really appreciate any suggestions and/or an explanation of how lipids (and lipid bilayers specifically) should be treated in a topology. If anyone has an example of a correct topology containing lipids, I would really appreciate it if you could share that as well. Thank you!

After investigating further, I’m bumping this with more specific questions.

In every tutorial I could find (KALP15 using GROMOS FF, Zoe Cournia’s tutorial using GROMOS FF, this one using OPLS FF, and this other one using AMBER FF), the topology file is prepared by hand for a structure file obtained elsewhere, and using .itp file(s) defining lipid topologies, also obtained elsewhere. Presumably, care was taken that the order of lipid molecule atoms in the structure file matches the order of atoms in the definition of the lipid’s topology in the .itp file. I don’t see any mechanism for checking that this is the case in any of the tutorials.

I am trying to use the charmm36 port, but it’s not clear to me how to prepare the topology file by hand. @jalemkul: For one, I don’t see a .itp file for lipids in the charmm36 port, only .hdb, .r2b, and .rtp files. Of those, .hdb and .rtp files contain entries for the lipids in my system (DMPC, CHL1). Is a .itp file supposed to be prepared from these files (and maybe from some parameter file(s) in toppar) automatically, by pdb2gmx or by another tool? Is there any documentation for this process?

Also, if I do succeed in preparing (or obtaining) definitions of lipid topologies, how can I be sure that atom matching between the input structure file and the topology definitions was done correctly? For amino acids and water, pdb2gmx produces warnings or errors if this matching cannot be done. But what about the case where I don’t use pdb2gmx, since it does not appear to identify lipids correctly?

Another bump, with an even more specific question.

@jalemkul (sorry to bombard you with questions Justin, and thanks so much for all your past help!), I learned here that pdb2gmx can be used to convert a coordinate file for a single molecule (a pdb coordinate file in my case) into a .gro coordinate file and (more importantly for me right now) a topology file for that molecule. It looks like pdb2gmx correctly recognizes the molecules as CHL1 and DMPC. I’m guessing I was correct before in guessing that pdb2gmx builds a topology out of the parameters for these molecules contained in .hdb and .rtf files, maybe together with atomtypes.atp and some other files?

In any case, now that I have system topology files form CHL1 and DMPC, I need to figure out how to convert these into include topology files. Is it as simple as removing the [ system ] and [ molecules ] sections each topology file? Equally importantly, how can I check that, once I assemble a full system topology, that topology is correct? Without, that is, running an actual simulation and seeing if I observe unexpected behavior.

I would really appreciate any input!

Not an expert response here, but if you’re already working with charmm36 you could consider using the CHARMM-GUI membrane builder to assemble the bilayer coordinate and topology files. They have an impressive number of lipids parameterised, and exporting the final system with a GROMACS equilibration option will assemble the topology for you.

Depending on the version of charmm36 you’re using, you may want to cross-check between your local charmm36 forcefield and the version that comes with the CHARMM-GUI tarball. If you’re using small molecules parameterised by CGENFF, this becomes vital because the CHARMM-GUI version of charmm36 doesn’t include a number of the atom types that newer versions of CGENFF use.

This is the expected behavior of pdb2gmx. It makes no functional difference for the simulation whether you treat the lipids like a “solvent” and #include an .itp file or have a very verbose topology. grompp treats the information the same way, and an #include statement just gets expanded N times based on the contents of [molecules].

Thank you both @tyd and @jalemkul for responding!

@tyd: unfortunately the CHARMM-GUI license does not allow me to use it in my actual work. It’s a shame, because it’s a great resource! However, have been (and hope to continue) using it on toy systems in order to “reverse engineer” what happens under the hood. That way I can apply the same “by hand” using the most recent port of charmm36 to GROMACS and the gmx toolset.

@jalemkul: Ah, I didn’t realize that grompp does this! Since I didn’t get any errors when I ran pdb2gmx (e.g. with matching atoms in the input pdb to atoms in the known topologies of CHL1 and DMPC), can I infer that pdb2gmx correctly identified those molecules, and that grompp will then build the correct atom-level description for the system?

More broadly, is there any mechanism, besides pdb2gmx for validating that a proper topology has been built for a system? If topology is prepared by means other than pdb2gmx, can grompp be counted on to detect the same problems pdb2gmx can identify, such as inability to map an atom in the input structure to an atom in the topology for the corresponding molecule, given that the molecule itself has been identified? Or, for that matter, the presence of molecules in the structure file that aren’t accounted for in the topology?

If pdb2gmx completes without error, that means the contents of the provided coordinate file sensibly matched everything the force field expects to define whatever residue(s) it finds. Any further validation requires comparison of energies (and/or forces) against a known standard. As it so happens, I just reviewed the proofs for our article describing the software used to make the latest port and I can tell you that the CHARMM36 port we produce robustly reproduces energies and forces from CHARMM (the software, which is the standard reference) and MD simulation outcomes across different systems.

In that case, would you advise against bypassing pdb2gmx, unless I am confident that the topology description in .inp files I include (specifically lipid topology) matches what is provided in the coordinate file, including the order in which component atoms are listed?

I don’t really understand the question. There are several ways to do equivalent things. pdb2gmx is the easiest, even if the topology looks a little different (less compact) than other approaches. It’s probably the one least prone to headaches and troubleshooting rather than trying to make lots of disparate things from different sources match.

Got it, thanks!

The reason I asked in the first place is because the membrane simulation tutorials I linked above bypass pdb2gmx for the membrane bilayer. It seems like a safe enough approach when you are confident that the topology you assemble correctly represents the system in the coordinate file(s) you provide (as appears to be the case in the tutorials), but is likely to be, as you say, prone to headaches otherwise.

For reference, the article is now online: https://pubs.acs.org/doi/10.1021/acs.jcim.3c00860