It happened that one of my GROMACS jobs was not finalised properly. As a result, I have the log, gro, edr, cpt, tpr files corresponding to the end of the simulation [ 1.5 ms ] , but the xtc file corresponds to an intermediate time moment [ 1 ms ] , from which the MD simulation job was extended. Apparently, the last 0.5 ms of the system trajectory is lost, and I need to restore it. Also, surprisingly, prev_cpt file seems to correspong to the time moment 1 ms.
I have two Qs :
what is the reliable way to start the job from that intermediate time moment? shall I use prev_cpt as a restart point?
what will GROMACS do with those log, gro, edr files, are they to be rewritten? Especially, I’m curious about the log file content, namely will GROMACS add new logs to the end of the file or will it rewrite the logs between 1 ms and 1.5 ms?
I think the first thing that I would try is to restart from the old checkpoint file, as described in the manual. Otherwise, given that you will already have to rerun a third of your simulation, I think your safest bet is to just start from scratch, which is annoying but might save you some headache. If you really don’t want to do that ,what you could do is extracting the last frame of your simulation, e.g. with gmx trjconv -dump, and generating a new tpr with that as the starting point, run for your desired length, and then concatenate the two trajectories. However, according to the same manual page as above, that’s not recommended.
Thanks for the GROMACS manual link and for the suggestions! Appreciated!
A few Qs, if you don’t mind :
do I get it clear that if I chose to restart from cpt file, then I don’t need to do anything else, but run, e.g. create a gro file with the current atom coordinates, create a new tpr file, etc.
following the link to the GROMACS manual that you shared, is it correct that, since the GROMACS job finished correctly, the restart from the previous cpt will lead to adding newly generated logs to the end of the log file?
I have a 3rd question, if you do not mind : suppose, I want to extend that failed simulation up to a longer time; then, naturally, I gotta change mdp file, specifically, nsteps line; given I want to restart from a cpt file that pretends to store the current state of the MD job, do I need to sample the last atom coordinates into gro and to create a new tpr file, or is cpt solely enough?
One thing first, definitely make a backup of your data first since something clearly went wrong with your files before. That said,
Correct, you shouldn’t run grompp again when starting from a restart file. E.g., just run gmx mdrun -cpi state_prev (or whatever your file is called, plus any flags you were using previously)
It will append to the existing files, from the checkpoint onwards (anything after the checkpoint is overwritten), unless you specify -noappend
You can use gmx convert-tpr -extend
All of this should be in the linked manual page, though. But thinking about it again, are you really sure the trajectory is corrupted? It seems strange to me that the .xtc file should be the only one that didn’t get written properly, while the others were. That shouldn’t happen under normal circumstances. Are you sure you weren’t moving files around while the simulation was running or something?
gmx convert-tpr -extend is a recipe that doesn’t imply mdp file modifications, however, if mdp is modified, then gmx grompp to create a new tpr is necessary; is it correct?
I value the manuals obviously and read them carefully. However, the most reliable way is the hands-on experience IMHO. Therefore, if users share their real work recipes, I tend to trust them :)
Regarding your Q about the trajectory file, it’s not corrupted, it’s just the data not added from 1 ms to 1.5 ms time range. Our technical implementation is as follows : the files necessary for an MD job are copied from the work folder to a temporary folder, where a GROMACS job is executed, and afterwards all the files are copied back to the work folder. Naturally, in case of, e.g. a power down, the backwards copying is terminated, and under the greatest risk are the heaviest files, e.g. xtc trajectories that might weight dozens of GBs. In case of that problematic job of mine, everything was copied normally, xtc copying apparently got terminated, and prev_cpt was after after xtc in the copying queue, so it was not copied. So, yeah, the trouble happened exactly bc of the files movement, but not by me.