What's the solution for issues extending mdrun without .trr file?

GROMACS version:2018.2-intel-mpi
GROMACS modification: Yes/No
Here post your question

Hi there,
Running a large system.
I browsed and benefited from most errors and steps from the community posts. I have also referred to the documentation on extending a long simulations.
However, I keep facing a bottleneck to extend a simulation where .trr file is not available.
I appreciate some guidance solve this issue.

I have MD run files without step.trr file or step.gro file.
It (production step) contains step.tpr, .xtc, .edr, .mdp, .log, .cpt
Renamed step.tpr as step_prev.tpr

  1. When extended using following commands, MDrun failed.

gmx convert-tpr -s step_prev.tpr -extend 100000 -o step.tpr
gmx mdrun -s step.tpr -cpi step.cpt -deffnm step

Got the classic error as this: Continuing from extended simulation fails to find output files

Output file appending has been requested,
but some output files listed in the checkpoint file step.cpt
are not present or not named as the output files by the current program:
Expect output files present:

step.log
step.edr
step.trr

  1. so next try explicitly defined those files in the script: -e step.edr, -x step.edr in the PBS script as follows:

gmx_mpi mdrun -ntomp 1 -npme 12 -deffnm step -cpi step.cpt -s step.tpr -e step.edr -g step.log -x step.xtc

Same error again.

a) How to do about resolving when there’s no .trr file?
b) If a .trr file is created from .xtc (last frame .gro), can that be used as a .trr to extend this simulation (which is more than 100ns).
c) Justin has stated in some answers to do a “clean” run using existing files. (“I also STRONGLY recommend the use of the -deffnm option with mdrun, so you get clean, explicitly named output, rather than potentially relying on a host of default files. For example:…” https://www.researchgate.net/post/How-does-one-extend-the-protein-simulation-in-Gromacs-and-check-the-total-time-of-the-MD)

How to do a clean (any) extension run without a .trr? Can it be done without a corresponding .trr

Thank you!
Hei

I cannot reproduce this issue with Gromacs 2022. It is possible that a bug regarding the expected files when restarting has been resolved in version released after 2018.2.

Changing version mid-project is undesirable, so you probably have to do your continuation with -noappend, and concatenate the output files afterward.

1 Like

@ebriand appreciate your response.

I have an update. I did exactly what Justin and you were saying with following command, certainly adding -noappend at the end with 2018.2 with no dramas. I could see at the backend that .log file is increasing in size and its content, and afterwhile a new .trr (with suffix step.part005.trr started generating with size > 0).

gmx_mpirun -npme 2 -deffnm step -cpi step.cpt -noappend

But my question remains expecting an explanation as to why an extend may not work (smooth continuation or any for that matter) in the absence of .trr file?

This is not the normal behaviour - if the original output is not .trr, one should not be needed when restarting.

However, since I cannot reproduce this behaviour with either Gromacs 2022 or Gromacs 2018.2, and that particular version is out of support, I cannot help you further on the why.

1 Like

I see. After running the above for a very slow 28ps/Hour…(ridiculous Open MPI threading issue), copied the folder and ran a separate run to continue from this one. Without -noappend as now there’s a .trr and -deffnm can find all the files correctly. Also changed the Open MPI threading numbers by setting up values manually and reducing any oversubscribed resources (not sure but Gromacs issued a warning before:

Using 48 MPI processes
Using 8 OpenMP threads per MPI process

On rank 0: oversubscribing the available 48 logical CPU cores per node with 64 threads.
This will cause considerable performance loss.

Next, updated the .pbs script file based on this suggestion: https://www.researchgate.net/post/how_to_monitor_the_progress_of_a_simulation_in_gromacs

gmx_mpi mdrun -ntomp 4 -npme 6 -v -deffnm step.part0001 -cpi step.part0001.cpt

I didn’t check on mdrun in gromacs manual what this -v about… It turns out, this run is not writing to .log, .edr or .trr at all…

What is this -v? What does it really do? what might be causing the mdrun to not update the .log, .edr, .trr files?

It just prints a progress indicator to the terminal. It’s not necessary and probably is slowing things down, but it will not prevent output to any file.

1 Like

Thanks for your help. Can I please know what could be the optimum computing resource to run this file?

I have tried various combinations, none so far seems to speed things up to a reasonable level.
Vendor: Intel
Brand: Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
Family: 6 Model: 85 Stepping: 4

run x + 1:

super slow run atm….28ps/HR

Running on 6 nodes with total 288 cores, 288 logical cores
Cores per node: 48
Logical cores per node: 48
nodes 6: cpus 8: memory 8gb
gmx_mpi mdrun -ntomp 4 -npme 6 -v

run x+ 2:
no outputs being written out (I am guessing this is running at 0.0001ps/HR…)

nodes 6: cpus 4: memory 8gb
gmx_mpi mdrun -ntomp 4 -npme 6 -v

run x +3:

nodes 6 cpus 8 memory 4gb
gmx_mpi mdrun -ntomp 1 -npme 20 -deffnm step.part0004 -cpi step.cpt -noappend

error: “Environment variable OMP_NUM_THREADS (8) and the number of threads requested
on the command line (1) have different values. Either omit one, or set them
both to the same value.”

Why are the cpus conflict with OMP_NUM_THREADS? not -ntomp ?
Is this a cluster bug? or my script incorrect?

Any direction on this please?