Slurm Job terminates and restarts while writing *.gro files at the end of mdrun

GROMACS version:2020.4
GROMACS modification: Yes/No
Here post your question
Good day,
My slurm job tends to terminate and restart while mdrun is trying to write the *.gro file at the end, in HPC cluster. I could see from the *.err file that everything is fine until wait time is 0 s and my session hangs up and after sometime the job restarts.

I think it’s an issue with mpirun having communication issues, but now sure.

So what am i missing and how can i correct it?

My mpi-run script for 5 nodes * 28 cores

mpirun -np 140 gmx_mpi mdrun -deffnm nvt_g_dm_ -v -s nvt_g_dm_.tpr

thank you…

Make sure mdrun completes well before SLURM wants the nodes back, e.g. if you requested 2 hours, run e.g. for 115 minutes to leave enough time for any reasonable file system to be able to write out the final gro file. Otherwise, you should be able to extract the final coordinates from the trajectory.

1 Like