GROMACS version:2022.5
GROMACS modification: No
Hi,
I was running a 100 ns membrane+protein+ligand simulation job in HPC using SLURM. My job ran out of time and got killed approximately around 92 ns.
I tried to restart the simulation so as to complete the remaining 8 ns simulation using the following command:
gmx_mpi mdrun -s md.tpr -cpi md_prev.cpt -v -append -deffnm md
Then the simulation got over successfully. However, when I checked md.log file i can see that there are some duplicated time steps in it. For instance:
DD step 46263999 load imb.: force 79.1% pme mesh/force 1.054
Step Time
46264000 92528.00000
Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
5.34296e+04 2.43940e+05 1.84763e+05 4.48997e+03 -1.78181e+03
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
3.82478e+04 -2.64975e+04 3.25754e+05 -5.84548e+06 2.03223e+04
Potential Kinetic En. Total Energy Conserved En. Temperature
-5.00282e+06 1.12946e+06 -3.87336e+06 7.79489e+06 3.03070e+02
Pressure (bar) Constr. rmsd
-8.74943e+01 4.64061e-06
Writing checkpoint, step 46264500 at Wed May 29 17:44:03 2024
4 2.46595e+05 1.85823e+05 4.56233e+03 -1.95436e+03
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
3.87148e+04 -2.75305e+04 3.24986e+05 -5.84069e+06 1.62945e+04
Potential Kinetic En. Total Energy Conserved En. Temperature
-5.00036e+06 1.13140e+06 -3.86896e+06 7.79472e+06 3.03591e+02
Pressure (bar) Constr. rmsd
1.82992e+01 4.65514e-06
DD step 46278999 load imb.: force 4.4% pme mesh/force 1.029
Step Time
46279000 92558.00000
Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
5.35400e+04 2.45639e+05 1.85842e+05 4.61183e+03 -1.92825e+03
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
3.85054e+04 -2.72041e+04 3.24764e+05 -5.84297e+06 1.61553e+04
Potential Kinetic En. Total Energy Conserved En. Temperature
-5.00305e+06 1.13006e+06 -3.87299e+06 7.79500e+06 3.03230e+02
Pressure (bar) Constr. rmsd
2.64827e+01 4.68069e-06
From the above I can see that from step 46264000 and time 92528.00000 the trajectory jumped to step 46279000 and time 92558.00000.
Interestingly, from the above step 46279000 and time 92558.00000 the simulation keeps on running till
DD step 46332999 load imb.: force 3.0% pme mesh/force 1.026
Step Time
46333000 92666.00000
Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
5.37965e+04 2.45166e+05 1.86201e+05 4.61486e+03 -2.18204e+03
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
3.82651e+04 -2.70649e+04 3.25583e+05 -5.84266e+06 1.62281e+04
Potential Kinetic En. Total Energy Conserved En. Temperature
-5.00205e+06 1.13090e+06 -3.87115e+06 7.80841e+06 3.03456e+02
Pressure (bar) Constr. rmsd
-9.84917e+00 4.64059e-06
And then it automatically jumps back to an earlier time step that is just after the step 46264000 and time 92528.00000
DD step 46264999 load imb.: force 92.0% pme mesh/force 1.020
Step Time
46265000 92530.00000
Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
5.31246e+04 2.45763e+05 1.85537e+05 4.52282e+03 -1.92235e+03
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
3.83409e+04 -2.73014e+04 3.27261e+05 -5.85147e+06 2.04276e+04
Potential Kinetic En. Total Energy Conserved En. Temperature
-5.00572e+06 1.12950e+06 -3.87622e+06 7.79479e+06 3.03080e+02
Pressure (bar) Constr. rmsd
7.18260e+01 4.63354e-06
And it continues all the way just repeating some of the time steps again.
So the md.log file shows a repetition of some portion of the trajectory. I went one step ahead to check if the same is shown in gmx check and the output is as follows:
Command line:
gmx check -f md.trr
Checking file md.trr
trr version: GMX_trn_file (single precision)
Reading frame 0 time 0.000
Atoms 426608
Last frame 1000 time 100000.000
Item #frames Timestep (ps)
Step 1001 100
Time 1001 100
Lambda 1001 100
Coords 1001 100
Velocities 1001 100
Forces 1001 100
Box 1001 100
I think that I might have duplicated frames in my trajectory but after looking at the output from gmx check I am a bit confused. I want to know how can I get rid of the duplicated frames and why such a thing happened? I am not an expert in this field hence I would be really happy if someone can explain what is going on.
Many thanks in advance.