GROMACS version:2019
GROMACS modification: Yes/No
Here post your question
I run REMD on our cluster. Appending jobs are run based on the checkpoint file cpt. However, the cluster sometimes crashes, so jobs are randomly terminated. As a result, the individual subsystems are not equal:
Initializing Replica Exchange
Repl There are 16 replicas:
Multi-checking the number of atoms ... OK
Multi-checking the integrator ... OK
Multi-checking init_step+nsteps ... OK
Multi-checking first exchange step: init_step/-replex ...
first exchange step: init_step/-replex is not equal for all subsystems
subsystem 0: 75849
subsystem 1: 75455
subsystem 2: 75849
subsystem 3: 75849
subsystem 4: 75849
subsystem 5: 75849
subsystem 6: 75849
subsystem 7: 75455
subsystem 8: 75849
subsystem 9: 75849
subsystem 10: 75849
subsystem 11: 75849
subsystem 12: 75849
subsystem 13: 75849
subsystem 14: 75849
subsystem 15: 75849
-------------------------------------------------------
Program: mdrun_mpi, version 2019.3
Source file: src/gromacs/gmxlib/network.cpp (line 745)
MPI rank: 15 (out of 16)
Fatal error:
The 16 subsystems are not compatible
Is there a way that I can tell the REMD to append from the “common” step. In my case, can it be appended from 75455?
It really requires huge resources to run REMD, and cluster crash is inevitable.