Simulation freeze in the middle of the run

GROMACS version: 2024.1
GROMACS modification: No
Here post your question

Hi all,

I am running a simulation that hang in the middle of the gmx mdrun while the CPU continously to run.

Basically in the first 10 hours, the simulation was running fine and the trajectories continue to print, however, at random point, it no longer print any new trajectories or file. When I restart it, it was able to start from the checkpoint and there is no corruption in the trajectories file. I have check there are plenty of RAM and memory therefore I do not think it is storage problem.

This happened couple of times and the exact same file run perfectly in other HPC (the simulation was tested on the new 96 cores server). Therefore, what is the best was to debug this or anyone has similar experience?

Best regards,

Ben

Hi – were you able to resolve your issue? I have the same issue.

  • simulations stop writing to disk after a certain amount of time (same time for a given simulation but varies for different systems), but gmx is still running.
  • If I kill job and restart from a .cpt file it continues fine, so system had not blown up, and then it again runs for a certain amount of time before hanging/not writing to disk anymore.
  • This seems to be happening with one version of gromacs (2024) on one cluster, but the same jobs run fine with version 2018.3 on a different machine.
    Thanks,
    Mala