Restarting the simulation

GROMACS version:2022.5
GROMACS modification: No
Hi,

I was running a 100 ns membrane+protein+ligand simulation job in HPC using SLURM. My job ran out of time and got killed approximately around 92 ns.

I tried to restart the simulation so as to complete the remaining 8 ns simulation using the following command:

gmx_mpi mdrun -s md.tpr -cpi md_prev.cpt -v -append -deffnm md

Then the simulation got over successfully. However, when I checked md.log file i can see the follows:

DD step 46263999 load imb.: force 79.1% pme mesh/force 1.054
Step Time
46264000 92528.00000

Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
5.34296e+04 2.43940e+05 1.84763e+05 4.48997e+03 -1.78181e+03
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
3.82478e+04 -2.64975e+04 3.25754e+05 -5.84548e+06 2.03223e+04
Potential Kinetic En. Total Energy Conserved En. Temperature
-5.00282e+06 1.12946e+06 -3.87336e+06 7.79489e+06 3.03070e+02
Pressure (bar) Constr. rmsd
-8.74943e+01 4.64061e-06

Writing checkpoint, step 46264500 at Wed May 29 17:44:03 2024

4 2.46595e+05 1.85823e+05 4.56233e+03 -1.95436e+03
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
3.87148e+04 -2.75305e+04 3.24986e+05 -5.84069e+06 1.62945e+04
Potential Kinetic En. Total Energy Conserved En. Temperature
-5.00036e+06 1.13140e+06 -3.86896e+06 7.79472e+06 3.03591e+02
Pressure (bar) Constr. rmsd
1.82992e+01 4.65514e-06

DD step 46278999 load imb.: force 4.4% pme mesh/force 1.029
Step Time
46279000 92558.00000

Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
5.35400e+04 2.45639e+05 1.85842e+05 4.61183e+03 -1.92825e+03
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
3.85054e+04 -2.72041e+04 3.24764e+05 -5.84297e+06 1.61553e+04
Potential Kinetic En. Total Energy Conserved En. Temperature
-5.00305e+06 1.13006e+06 -3.87299e+06 7.79500e+06 3.03230e+02
Pressure (bar) Constr. rmsd
2.64827e+01 4.68069e-06

From the above I can see that from step 46264000 and time 92528.00000 the trajectory jumped to step 46279000 and time 92558.00000. Later it keeps writing the frames till

DD step 46332999 load imb.: force 3.0% pme mesh/force 1.026
Step Time
46333000 92666.00000

Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
5.37965e+04 2.45166e+05 1.86201e+05 4.61486e+03 -2.18204e+03
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
3.82651e+04 -2.70649e+04 3.25583e+05 -5.84266e+06 1.62281e+04
Potential Kinetic En. Total Energy Conserved En. Temperature
-5.00205e+06 1.13090e+06 -3.87115e+06 7.80841e+06 3.03456e+02
Pressure (bar) Constr. rmsd
-9.84917e+00 4.64059e-06

And then it automatically jumps back to the following step and time

DD step 46264999 load imb.: force 92.0% pme mesh/force 1.020
Step Time
46265000 92530.00000

Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
5.31246e+04 2.45763e+05 1.85537e+05 4.52282e+03 -1.92235e+03
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
3.83409e+04 -2.73014e+04 3.27261e+05 -5.85147e+06 2.04276e+04
Potential Kinetic En. Total Energy Conserved En. Temperature
-5.00572e+06 1.12950e+06 -3.87622e+06 7.79479e+06 3.03080e+02
Pressure (bar) Constr. rmsd
7.18260e+01 4.63354e-06

To clear my doubt I ran gmx check on md.trr file and got the following output:

Command line:
gmx check -f md.trr

Checking file md.trr
trr version: GMX_trn_file (single precision)
Reading frame 0 time 0.000

Atoms 426608

Last frame 1000 time 100000.000

Item #frames Timestep (ps)
Step 1001 100
Time 1001 100
Lambda 1001 100
Coords 1001 100
Velocities 1001 100
Forces 1001 100
Box 1001 100

I think that I might have duplicated frames in my trajectory but after looking at the output from gmx check I am a bit confused. I want to know how can I get rid of the duplicated frames and why such a thing happened?

Also after the simulation ended I am not able to find md.xtc file. It appears that this was never created. How can I move ahead with analysis if I do not have an md.xtc file. Can some one help me with specific command to get the xtc file.

Many thanks in advance

I need some help. I will be grateful if any expert can clear my doubt.

Thanks and regards

If you didn’t set nstxout or nstxout-compressed in the .mdp file, the trajectory file would not be written, and it is hard to make further analysis.

Hi,

I solved the problem of generating md.xtc file by using the following command:

gmx trjconv -f md.trr -s md.tpr -o md.xtc

I have the md.xtc file now but I am still not clear about the discrepancy in the frames. Can someone guide me with that please.

Thanks

From gmx check it doesn’t seem like there are any duplicate frames.

Many thanks for replying. I have a small follow-up question though. I was just wondering why the steps and time in my md.log jumps ahead and jumps back? Could you please answer this. Many thanks for your help.

I don’t know. It almost looks like there are two simulations in parallel writing to the same log file, but there should be checks against that - if the file system is working correctly.

It’s very difficult (for me) to diagnose your problem afterwards. As long as the trajectory files and energy files are OK it shouldn’t be a problem. If it keeps happening it would be interesting to try to understand what the problem is.

Many thanks for sharing your views on this.

best regards