GROMACS version: 2020.3
GROMACS modification: No
Dear All:
I have a strange problem when I continue a simulation from a checkpoint.
I am using GPU acceleration to run atomistic simulations (membrane bilayers with several kinds of lipids).
For this simulation, I used 1 GPU card. Afterwards, I was able to compile gromacs (the same version, 2020.3) so that it can run the simulations using several GPU cards within 1 node. Then I tried to continue this simulation using 4 GPU cards from the check point use the following command:
gmx mdrun -v -deffnm myfile -cpi myfile.cpt -append no -npme 1 -ntmpi 4 -ntomp 4 -pme gpu -nb gpu -bonded gpu -nstlist 200
and I got the following errors:
Program: gmx mdrun, version 2020.3-plumed-2.7.4-dev-20220218-660d9bc-dirty-unknown
Source file: src/gromacs/domdec/domdec_topology.cpp (line 421)
MPI rank: 0 (out of 4)
Fatal error:
5035 of the 741069 bonded interactions could not be calculated because some
atoms involved moved further apart than the multi-body cut-off distance
(1.81175 nm) or the two-body cut-off distance (1.81175 nm), see option -rdd,
for pairs and tabulated bonds also see option -ddcheck
I am actually farmiliar with this error and I tried to set -rdd 1.4 and played with -ntomp and -nt , but this time it does not work.
The strange part is, if I continue from the check point using 1 GPU, it can run smoothly. And if I start the simulations from beginning (not use the check point) by using 4 GPU, it can still run smoothly and get the acceleration I expected. The error only appears when I tried to continue from the check point using 4 GPU.
If I extract the last snapshot in the trajectory file and re-generate a tpr file to run the simulations, it goes smoothly by using either 1 GPU and 4 GPU (I think this further excludes a possible problem of the system).
This occurs for all systems I tried (in total 4), so it seemed unlikely to be a problem of my system setup.
Thanks for reading this message, and I look forward to hearing suggestions. I am really appreciate for any help you provide. Thanks in advance.
With my best regards,
Ruo-Xu