Running pdb2gmx on large water/membrane system

GROMACS version: 2023
GROMACS modification: Yes/No

I am preparing for simulation a large system (453696 atoms) consisting of a lipid bilayer and water on both sides. The intention is to pre-equilibrate this solvated bilayer for subsequent insertion of multiple membrane proteins. To assemble this system, I started with a pre-equilibrated lipid bilayer + water system from SLipids (in .gro format), generated “symmetry mates” (under PBC) in pymol, and cut off the extra lipid bilayers above and below the one in the middle. What remains are 9 images of the lipid bilayer in a 3x3 arrangement in the X-Y plane with a double-thick layer of water on either side.

I have several questions about what happens when I run pdb2gmx on this pdb file.

  1. In the topology pdb2gmx produces, lipid molecules within one chain are treated as aa residues in a polypeptide chain – as part of a single molecule. The resulting list of molecules in the topology file looks like this, where each “Other” is a chain containing 128 lipid molecules (mixture of CHL1 and DMPC):
[ molecules ]
; Compound        #mols
Other               1
Other2              1
Other3              1
Other4              1
Other5              1
Other6              1
Other7              1
Other8              1
Other9              1
SOL             12165
SOL             12172
SOL             12130
SOL             12157
SOL             12166
SOL             12095
SOL             12163
SOL             12168
SOL             12136

With some testing I found that topologies built from the original .gro files from SLipids also have lipids in the same chain as part of one molecule. Is this how individual lipid molecules should be represented in a topology, or there a way to get pdb2gmx to treat lipid molecules the way it treats solvent molecules – as multiple molecules belonging to one chain? Or is pdb2gmx not the right tool for building a topology for this kind of system?

  1. How does pdb2gmx match water molecules to entries in solvent.rtp, and what is the CHARMM convention for naming atoms in waters? When I previously prepared a system with retained crystallographic waters, where only oxygens were present as HETATM entries with residue name ‘HOH’ and atom name ‘O’, pdb2gmx renamed these atoms to ‘OW’, like so: Renaming atom 'O' in residue 17 HOH to 'OW'. I then had to add an ‘HOH’ entry to solvent.rtp to get pdb2gmx to assign these atoms correctly, like so:
[ HOH ]
; tip3p water model
  [ atoms ]
       OW       OT -0.8340   1
      HW1       HT  0.4170   1
      HW2       HT  0.4170   1
  [ bonds ]
       OW   HW1
       OW   HW2
      HW1   HW2
  [ angles ]
      HW1    OW   HW2

In this system, which was converted to pdb format from .gro, waters are named according to the CHARMM convention: reside name ‘SOL’ and atom names matching the [ TIP3 ] entry in solvent.rtp. However, pdb2gmx still matches ‘SOL’ residues to [ HOH ] entries, but then is not able to match ‘OH2’ atoms to ‘OW’ atoms in [ HOH ]. Is there any residue name for waters that would get pdb2gmx to match the residue to the [ TIP3 ] entry? Or do the atoms need to be renamed according to [ HOH ] names (‘OW’, ‘HW1’, ‘HW2’)? The latter is the solution I’m using now.

  1. The fact that there are more than 99,999 ATOM entries does not appear to be an issue in itself, because the output .gro file has the same number of atoms as the input pdb. However, when pymol wrote out the pdb, it used the number 99999 for every ATOM entry past 99,999. How does GROMACS order atoms with the same number? Does it simply count up in the order in which ATOM entries appear in the input file?