NVT Simulation Stuck at Step 100

GROMACS version: 2023.4 (Local)
GROMACS modification: Yes/No

Hey there!

I have been stuck on this issue for some time now, and no amount of user discussions seem to address my issue. I am trying to simulate an atomistic ion channel, and yet the NVT step (first step I am running) appears stuck on step zero and no attempted solution resolves the issue. It works fine on local, but the second I try to run it on significant GPU or cluster it gets stuck. Note there is a pressure scaling issue it warns about. That is a nonissue as that disappears in around 50 simulation steps in local hardware. Here are the details regarding submitted commands, NVT file commands, and log files:

Simulation Size: 980000 molecules.
Hardware: NVIDIA A100 GPU
NVT commands:
gmx_mpi grompp -f NVT.mdp -c EM.gro -p 8HKQAtom00N00K00Z.top -o NVT.tpr -maxwarn 4
gmx_mpi mdrun -ntomp 8 -v -deffnm NVT
NVT.mdp file details:
integrator = md
dt = 0.002
nsteps = 100000000
nstxout-compressed = 10000
compressed-x-precision = 10000
nstlog = 10000
nstenergy = 10000
nstcalcenergy = 500
cutoff-scheme = verlet
nstlist = 1
ns_type = grid
pbc = xyz
verlet-buffer-tolerance = 0.005
coulombtype = reaction-field
rcoulomb = 1.2
epsilon_r = 15
epsilon_rf = 0
vdw_type = cutoff
vdw-modifier = Potential-shift-verlet
rvdw = 1.2
tcoupl = v-rescale
tc-grps = Protein Non-Protein
tau-t = 1.0 1.0
ref-t = 310 310
gen-temp = 310
Pcoupl = C-rescale
Pcoupltype = Isotropic
ref-p = 1.0 1.0
tau-p = 5.0
compressibility = 3e-4 3e-4
gen-vel = no
constraints = h-bonds
constraint_algorithm = Lincs
lincs_order = 4
lincs_warnangle = 90
NVT.log file:
There are: 948976 Atoms
Updating coordinates and applying constraints on the GPU.
Constraining the starting coordinates (step 0)
Constraining the coordinates at t0-dt (step 0)
Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
0: rest
RMS relative constraint deviation after constraining: 2.78e-06
Initial temperature: 3.12762e-06 K
Started mdrun on rank 0 Thu Jul 11 11:01:38 2024
Step Time
0 0.00000
Step 0 Warning: pressure scaling more than 1%, mu: 1.09091 1.09091 1.09091
Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
4.33573e+04 2.37301e+05 4.67317e+05 3.27539e+03 -6.58257e+03
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Potential
1.13006e+05 3.08084e+04 1.05483e+06 -9.56765e+05 9.86545e+05
Kinetic En. Total Energy Conserved En. Temperature Pressure (bar)
3.65085e+03 9.90196e+05 4.59899e+06 4.36281e-01 2.17355e+04
Constr. rmsd
2.77985e-06
Slurm file:
Reading file NVT.tpr, VERSION 2024.2 (single precision)
1 GPU selected for this run.
Mapping of GPU IDs to the 1 GPU task in the 1 rank on this node:
PP:0
PP tasks will do (non-perturbed) short-ranged interactions on the GPU
PP task will update and constrain coordinates on the GPU
Using 1 MPI process
Non-default thread affinity set, disabling internal thread affinity
Using 8 OpenMP threads
WARNING: This run will generate roughly 45347 Mb of data
starting mdrun ‘Title’
100000000 steps, 200000.0 ps.
Step 0 Warning: pressure scaling more than 1%, mu: 1.09091 1.09091 1.09091
^Mstep 0^Mstep 100, will finish Fri Dec 27 19:34:19 2024^Mstep 200, will finish Tue Dec 24 08:57:18 2024^Mstep 300, will finish Mon Dec 23 17:08:24 2024^Mstep 400, will finish Mon Dec 23 04:02:50 2024^Mstep 500, will finish Mon Dec 23 00:05:26 2024^Mstep 600, will finish Sun Dec 22 20:13:45 2024^Mstep 700, will finish Sun Dec 22 23:06:03 2024^Mstep 800, will finish Sun Dec 22 21:45:25 2024^Mstep 900, will finish Sun Dec 22 20:18:56 2024^Mstep 1000, will finish Sun Dec 22 22:47:05 2024^Mstep 1100, will finish Mon Dec 23 01:33:16 2024^Mstep 1200, will finish Mon Dec 23 02:20:51 2024^Mstep 1300, will finish Mon Dec 23 07:51:27 2024^Mstep 1400, will finish Mon Dec 23 08:27:20 2024^Mstep 1500, will finish Mon Dec 23 09:41:09 2024^Mstep 1600, will finish Mon Dec 23 14:06:54 2024^Mstep 1700, will finish Mon Dec 23 15:24:17 2024^Mstep 1800, will finish Mon Dec 23 16:26:06 2024^Mstep 1900, will finish Mon Dec 23 19:46:15 2024^Mstep 2000, will finish Mon Dec 23 22:14:16 2024^Mstep 2100, will finish Mon Dec 23 23:32:51 2024^Mstep 2200, will finish Tue Dec 24 03:15:20 2024^Mstep 2300, will finish Tue Dec 24 04:53:59 2024^Mstep 2400, will finish Tue Dec 24 05:54:57 2024^Mstep 2500, will finish Tue Dec 24 07:48:14 2024^Mstep 2600, will finish Tue Dec 24 09:30:01 2024^Mstep 2700, will finish Tue Dec 24 11:22:09 2024^Mstep 2800, will finish Tue Dec 24 14:08:10 2024^Mstep 2900, will finish Tue Dec 24 16:25:38 2024^Mstep 3000, will finish Tue Dec 24 18:33:43 2024^Mstep 3100, will finish Tue Dec 24 20:39:32 2024^Mstep 3200, will finish Tue Dec 24 22:46:45 2024^Mstep 3300, will finish Wed Dec 25 01:11:41 2024^Mstep 3400, will finish Wed Dec 25 03:06:53 2024^Mstep 3500, will finish Wed Dec 25 05:05:51 2024^Mstep 3600, will finish Wed Dec 25 07:47:20 2024^Mstep 3700, will finish Wed Dec 25 10:17:53 2024^Mstep 3800, will finish Wed Dec 25 12:27:23 2024^Mstep 3900, will finish Wed Dec 25 15:14:03 2024^Mstep 4000, will finish Wed Dec 25 18:13:58 2024^Mstep 4100, will finish Wed Dec 25 20:30:35 2024^Mstep 4200, will finish Wed Dec@@@

Solutions I have tried thus far:
Change between srun gmx_mpi and gmx_mpi
Change between GROMACS versions 2023.1 and 2024.1
Change pressure coupling
Change pressure coupling type
Change constraints
Change checkpoints and frequency

Dear @zamydm

Why are you using the -maxwarn 4 flag? What are the warning you get that you are silencing?

The .mdp seems confusing. You are defining the pcoupltype as isotropic but you give two values as ref-p and compressibility, while only one is required. Then you set gen-vel=no, so you do not request the generation of velocities (which, if this is the first NVT, is arguably wrong) but set gen-temp=310 which is not used as you are requesting that no velocities are generated. The constaint algorithm is LINCS, although maybe GROMACS is not sensitive to this.

Then there are other values that are a bit confusing, like

compressed-x-precision = 10000

do you really need this level of precision? While not the (implicit automatic) value of 1000?

lincs_warnangle = 90

Why are you modifying this from the standard value of 30? This may hide a lot of problems.

nstlist = 1

Why are you forcing the calculations of lists every step? This will make your simulations much slower.

coulombtype = reaction-field
rcoulomb = 1.2
epsilon_r = 15
epsilon_rf = 0

Why these type of electrostatics? What force field are you using? Similarly, why this values of compressibility?

compressibility = 3e-4 3e-4

Also from your description of the system you are simulating an atomistic ion channel, which supposedly is embedded in a lipid bilayer? In that case, also the pressure coupling needs to be changed to semiisotropic, which is much more reasonable for bilayer systems.

My main take anyway is remove the -maxwarn 4 in your compilation step. It is extremely uncommon that you have to maxwarn something to force the compilation, and in such cases you really have to have a strong reason for it. Check what GROMACS is complaining about in the compilation step.

Maxwarn -4 flag is to suppress the pressure warning as like I said the warning is harmless and disappears after first few steps.

I apologize for errors in my NVT file. I am new to running simulations and tried to make my own for the first time using advice from my colleagues. Addressing your concerns:
Compressed-x-precision is a typo, I thought I listed it as 1000.
lincs_warnangle = 90 is correct as the default 30 causes just about every simulation in existence to crash. I talked with some peers and they highly recommended using 90 as 30 can be extremely problematic in most scenarios. This also makes sense since most lipids naturally flex a lot and the choice of 30 is very restrictive.
nstlist I must have misunderstood the documentation. I will correct this.
Electrostatics and compressibility I left as default from a colleague. I can look into this further.

I will say thank you though!

Alright, does this look better?

integrator = md
dt = 0.002
nsteps = 100000000

;-----------------------------
; Output control
;-----------------------------

nstxout-compressed = 10000
compressed-x-precision = 1000
nstlog = 5000
nstenergy = 5000
nstcalcenergy = 500

cutoff-scheme = verlet
nstlist = 20
ns_type = grid
pbc = xyz
verlet-buffer-tolerance = 0.005

coulombtype = reaction-field
rcoulomb = 1.2
epsilon_r = 15
epsilon_rf = 0
vdw_type = Cut-off
vdw-modifier = Potential-shift-verlet
rvdw = 1.2

tcoupl = v-rescale
tc-grps = Protein Non-Protein
tau-t = 1.0 1.0
ref-t = 310 310
gen-temp = 310
Pcoupl = C-rescale
Pcoupltype = Semiisotropic
ref-p = 1.0 1.0
tau-p = 1.0
compressibility = 4.5e-5 4.5e-5
gen-vel = yes
constraints = h-bonds
constraint_algorithm = Lincs
lincs_order = 4
lincs_warnangle = 30

Dear @zamydm,

I won’t comment on the electrostatics part (coulombtype, rcoulombetc.) as those are force field dependent, but the rest seems for sure more consistent!

Also, if you are starting from post energy minimization (since you are generating the velocities), it may be worth having a short NVT step in which you start heating up the system and then change to NpT to relax the box side lengths.

The unfortunate part is that this is supposed to be NVT…

How do I properly distinguish between NVT and NPT files?

Edit: Nevermind, found it on my own. Needed to turn off Pcoupl

As you said, NVE has tcoupl and pcoupl set to no, NVT has temperature coupled and only pcoupl set to no, while NPT has both thermostat and barostat coupling. :)

Thanks for the help, you are awesome! I hope you have a wonderful day!