Confusing Density from NPT run

GROMACS version:2023 (gromacs-amd-gfx90a/2023)
GROMACS modification: Yes/No

Hi all,
I am running MD on a protein-ligand complex.
My protein is huge with 6 chains and 277.25 kDa.
My forcefield parameterisation was using AmberForce field FF19SB / ambertools. I have decided to use OPC water model to solvate my protein-ligand complex
I decided to follow several protocols about my protein (from literature) and decided to do
EM (steep descent) → restrained NPT at 20K → restrained NPT at 300K → Unrestrained NPT at 300K → MD [for restraints, i just standardise to 1000 kj/(molnm)
(i might improve on this workflow especially with reducing force constants but i have a more important question below).

i decided to start low temperature and increase to high temperature because initially, i tried NVT->NPT (restrained) at 300K and got densities at ~300kg/m^3 which is far from experimental water or OPC in this case. Since I believe my protein looks stable, i can skip the NVT equilibration and carry out NPT at lower temperatures in hopes that my system would contract and then reach desired density. Not sure if this is the right protocol (EM → Restrained NVT lower temp → Restrained NVT desired temp → Unrestrained NVT desired temp) . Any advice would be appreciated.

For all restrained NPT runs, density is kept around 300kg/m^3 and finally at the unrestrained run, density shot to about 1300 kg/m^3.
This is far from the water’s density but i would say relatively close to average density reported for proteins. Now i wonder if the density actually reflected the large protein’s density instead of the water or if that is incorrect.
I wonder if 1) OPC water model topology was incorrect?
2) Did I not use enough water molcules to solvate the system? (but the water molecules covers the entire protein well and no vacuum bubbles during NPT)

Side note: When removing PBC and visualisating the trajectory, the ligand seems stable even in unrestrained runs and seems to be fine so im believe my system is stable but the density is throwing me a little of a little

Any advice/help is appreciated. Thanks all

OPC water model topology in case anyone’s wants to see it
found in gromacs_ff/amber14sb_parmbsc1_opc_lmi.ff/opc.itp at master · intbio/gromacs_ff · GitHub
[ moleculetype ]
; molname nrexcl ; OPC model
WAT 2

[ atoms ]
; id at type res nr res name at name cg nr charge mass
1 O 1 SOL O 1 0 16.00000
2 H 1 SOL H1 1 0.67914 1.00800
3 H 1 SOL H2 1 0.67914 1.00800
4 EPW 1 SOL EPW 1 -1.35828 0.00000

#ifndef FLEXIBLE

[ settles ]
; i funct doh dhh
1 1 0.08724 0.13712

#else

[ bonds ]
; i j funct length force.c.
1 2 1 0.08724 502416.0 0.08724 502416.0
1 3 1 0.08724 502416.0 0.08724 502416.0

[ angles ]
; i j k funct angle force.c.
2 1 3 1 103.6 628.02 103.6 628.02

#endif

[ virtual_sites3 ]
; Vsite from funct a b
4 1 2 3 1 0.147722363 0.147722363

[ exclusions ]
1 2 3 4
2 1 3 4
3 1 2 4
4 1 2 3
; The position of the virtual site is computed as follows:
;
; O
;
; V
;
; H H
;
; Ewald OPC:
; const = distance (OV) / [ cos (angle(VOH)) * distance (OH) ]
; 0.01594 nm / [ cos (51.8 deg) * 0.0872433 nm ]
; then a = b = 0.5 * const = 0.147722363
;
; Vsite pos x4 = x1 + a*(x2-x1) + b*(x3-x1)

npt.mdp :
title = Protein-ligand complex NPT equilibration
define = -DPOSRES ; position restrain the protein and ligand
; Run parameters
integrator = md ; leap-frog integrator
nsteps = 500000 ; 2 * 500000 = 1000 ps = 1ns
dt = 0.002 ; 2 fs
; Output control
nstenergy = 5000 ; save energies every 10.0 ps
nstlog = 5000 ; update log file every 10.0 ps
nstxout-compressed = 5000 ; save coordinates every 10.0 ps
; Bond parameters
continuation = no ; first NPT run
constraint_algorithm = lincs ; holonomic constraints
constraints = h-bonds ; bonds to H are constrained
lincs_iter = 1 ; accuracy of LINCS
lincs_order = 4 ; also related to accuracy
; Neighbor searching and vdW
cutoff-scheme = Verlet
ns_type = grid ; search neighboring grid cells
nstlist = 20 ; largely irrelevant with Verlet
rlist = 1.0
vdwtype = Cut-off
vdw-modifier = Potential-shift
rvdw = 1.0 ; short-range van der Waals cutoff (in nm)
; Electrostatics
coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics
rcoulomb = 1.0
pme_order = 4 ; cubic interpolation
fourierspacing = 0.125 ; grid spacing for FFT
; Temperature coupling
tcoupl = V-rescale ; modified Berendsen thermostat
tc-grps = protein_grp non_protein_grp ; two coupling groups - more accurate
tau_t = 0.5 0.5 ; time constant, in ps
ref_t = 50 50 ; reference temperature, one for each group, in K
; Pressure coupling
pcoupl = C-rescale ; pressure coupling is on for NPT
pcoupltype = isotropic ; uniform scaling of box vectors
tau_p = 0.5 ; time constant, in ps
ref_p = 1.0 ; reference pressure, in bar
compressibility = 4.5e-5 ; isothermal compressibility of water, bar^-1
refcoord_scaling = com
; Periodic boundary conditions
pbc = xyz ; 3-D PBC
; Dispersion correction used with OPC water model
DispCorr = EnerPres
; Velocity generation
gen_vel = yes ; assign velocities from Maxwell distribution
gen_temp = 50 ; temperature for Maxwell distribution
gen_seed = -1 ; generate a random seed

npt2:
title = Protein-ligand complex NPT equilibration
define = -DPOSRES ; position restrain the protein and ligand
; Run parameters
integrator = md ; leap-frog integrator
nsteps = 500000 ; 2 * 500000 = 1000 ps = 1ns
dt = 0.002 ; 2 fs
; Output control
nstenergy = 5000 ; save energies every 10.0 ps
nstlog = 5000 ; update log file every 10.0 ps
nstxout-compressed = 5000 ; save coordinates every 10.0 ps
; Bond parameters
continuation = yes ; Continue from NPT
constraint_algorithm = lincs ; holonomic constraints
constraints = h-bonds ; bonds to H are constrained
lincs_iter = 1 ; accuracy of LINCS
lincs_order = 4 ; also related to accuracy
; Neighbor searching and vdW
cutoff-scheme = Verlet
ns_type = grid ; search neighboring grid cells
nstlist = 20 ; largely irrelevant with Verlet
rlist = 1.0
vdwtype = Cut-off
vdw-modifier = Potential-shift
rvdw = 1.0 ; short-range van der Waals cutoff (in nm)
; Electrostatics
coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics
rcoulomb = 1.0
pme_order = 4 ; cubic interpolation
fourierspacing = 0.125 ; grid spacing for FFT
; Temperature coupling
tcoupl = V-rescale ; modified Berendsen thermostat
tc-grps = protein_grp non_protein_grp ; two coupling groups - more accurate
tau_t = 1 1 ; time constant, in ps
ref_t = 150 150 ; reference temperature, one for each group, in K
; Pressure coupling
pcoupl = C-rescale ; pressure coupling is on for NPT
pcoupltype = isotropic ; uniform scaling of box vectors
tau_p = 1.0 ; time constant, in ps
ref_p = 1.0 ; reference pressure, in bar
compressibility = 4.5e-5 ; isothermal compressibility of water, bar^-1
refcoord_scaling = com
; Periodic boundary conditions
pbc = xyz ; 3-D PBC
; Dispersion correction used with OPC water model
DispCorr = EnerPres
; Velocity generation
gen_vel = no ; velocity generation off

npt3:
title = Protein-ligand complex NPT equilibration
define = -DPOSRES ; position restrain the protein and ligand
; Run parameters
integrator = md ; leap-frog integrator
nsteps = 500000 ; 2 * 500000 = 1000 ps = 1ns
dt = 0.002 ; 2 fs
; Output control
nstenergy = 5000 ; save energies every 10.0 ps
nstlog = 5000 ; update log file every 10.0 ps
nstxout-compressed = 5000 ; save coordinates every 10.0 ps
; Bond parameters
continuation = yes ; continuing from previous NPT
constraint_algorithm = lincs ; holonomic constraints
constraints = h-bonds ; bonds to H are constrained
lincs_iter = 1 ; accuracy of LINCS
lincs_order = 4 ; also related to accuracy
; Neighbor searching and vdW
cutoff-scheme = Verlet
ns_type = grid ; search neighboring grid cells
nstlist = 20 ; largely irrelevant with Verlet
rlist = 1.0
vdwtype = Cut-off
vdw-modifier = Potential-shift
rvdw = 1.0 ; short-range van der Waals cutoff (in nm)
; Electrostatics
coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics
rcoulomb = 1.0
pme_order = 4 ; cubic interpolation
fourierspacing = 0.125 ; grid spacing for FFT
; Temperature coupling
tcoupl = V-rescale ; modified Berendsen thermostat
tc-grps = protein_grp non_protein_grp ; two coupling groups - more accurate
tau_t = 1 1 ; time constant, in ps
ref_t = 300 300 ; reference temperature, one for each group, in K
; Pressure coupling
pcoupl = C-rescale ; pressure coupling is on for NPT
pcoupltype = isotropic ; uniform scaling of box vectors
tau_p = 2.0 ; time constant, in ps
ref_p = 1.0 ; reference pressure, in bar
compressibility = 4.5e-5 ; isothermal compressibility of water, bar^-1
refcoord_scaling = com
; Periodic boundary conditions
pbc = xyz ; 3-D PBC
; Dispersion correction used with OPC water model
DispCorr = EnerPres
; Velocity generation
gen_vel = no ; velocity generation off

How did you add water to the system? A condensed-phase system with an initial density of 300 kg/m³ signals a lot of vacuum. In VMD, you can visualize changes in box size if you type “pbc box” in the TCL console.

Two considerations,

  1. I solvated using AmberTools tleap but found that the water molecules did not seem to enter the chains to make H-bonding network in my initial EM → NVT → NPT → MD
  2. So, i solvated using AmberTools tleap + added the crystallography water molecules

pbc box using the original trajectory without centering or after centering? I dont remember seeing any vacuum gaps in any of my runs but ill have a look again

AmberTools has a few options to add solvent, it might be that something failed there. Check how many OPC waters you’ve got in your box, and what the box size is.

Centering shouldn’t change anything if you have a vacuum there, so either way

Just to add more details

this was how i solvated water molecules in the tleap
solvateoct mol OPCBOX 10.0
addions2 mol Na+ 173
addions2 mol Cl- 76

i added 2393 crystallography WAT and
this was tleap output which added 43884 water molecules
Scaling up box by a factor of 1.210620 to meet diagonal cut criterion
Solute vdw bounding box: 95.644 92.735 131.303
Total bounding box for atom centers: 155.515 155.515 155.515
(box expansion for ‘iso’ is 72.5%)
Solvent unit box: 18.865 18.478 19.006
Volume: 1917070.582 A^3 (oct)
Total mass 1108755.980 amu, Density 0.960 g/cc
Added 43884 residues.

The eventual topology seems to have the correct total number of WAT molecules WAT 46277
43884 + 2393 = 46277

if the solvation by AmberTools is causing issues maybe ill just solvate with gromacs then (?)

So now the question is, how did you start with a density of 0.96 in Amber and got 0.3 in Gromacs? Perhaps your box dimensions changed, or were converted incorrectly, or you converted a file that wasn’t the final one.

This are the commands i ran, based on what you are saying, im getting the idea that i may have messed the very first step which is to define the box?
#Define box
echo 0 | gmx_mpi editconf -f topol.gro -o newbox.gro -bt octahedron -d 1.2

#EM
gmx_mpi grompp -f em.mdp -c newbox.gro -p topol.top -o em.tpr -v

#Make index
gmx_mpi make_ndx -f em.gro -o index.ndx

##4 phases of NPT to slowly get desired density##
gmx_mpi grompp -f npt.mdp -c em.gro -t em.trr -r em.gro -p topol.top -n index.ndx -o npt.tpr
gmx_mpi grompp -f npt2.mdp -c npt.gro -t npt.cpt -r npt.gro -p topol.top -n index.ndx -o npt2.tpr
gmx_mpi grompp -f npt3.mdp -c npt2.gro -t npt2.cpt -r npt2.gro -p topol.top -n index.ndx -o npt3.tpr
gmx_mpi grompp -f npt4.mdp -c npt3.gro -t npt3.cpt -r npt3.gro -p topol.top -n index.ndx -o npt4.tpr
gmx_mpi grompp -f md_10ns.mdp -c npt4.gro -t npt4.cpt -p topol.top -n index.ndx -o md_10ns.tpr

I believe that was the mistake
echo 0 | gmx_mpi editconf -f topol.gro -o newbox.gro -bt octahedron -d 1.2

i shouldnt have done that since my gro was already a octahedron…

i proceeded to do
gmx_mpi grompp -f em.mdp -c topol.gro -p topol.top -o em.tpr -v
but got insane steric clashes
Energy minimization has stopped because the force on at least one atom is not
finite. This usually means atoms are overlapping. Modify the input
coordinates to remove atom overlap or use soft-core potentials with the free
energy code to avoid infinite forces.
You could also be lucky that switching to double precision is sufficient to
obtain finite forces.

Steepest Descents converged to machine precision in 23 steps,
but did not reach the requested Fmax < 1000.
Potential Energy = 3.7172511e+21
Maximum force = inf on atom 66275
Norm of force = inf

the atom 66275 points to two different water molecules
(i have 44k water molecules so there are repeated numbers after 99999 atom number)
both of the water molecules (which are outside of protein chains, surrounding the protein) do not seem to be clashing with other molecules in the gro file so how shall i proceed from here…?
i could go back to my initial topology without crystallography water molecules (which does energy minimise without clashes interestingly) but the water molecules do not seem to enter into the protein chains so a little concerning?

i just tried positional restraining the water molecules but same error seems to happen.
Any thoughts?

Oh yes, that solves it. You added another 1.2 nm margin to your original box (-d). Even if your molecules were not broken across PBC before that, it will make for a strangely behaving system.

I remember there’s a tiny difference in the octahedron definition between Gromacs and Amber - some last decimal places in the box angle value - so that if you just transfer them naively, things will not fit exactly. So as you pointed out, if you have the possibility to adjust the box and solvate in Gromacs again, it tends to be the safer option.