Running on Open MPI

GROMACS version: 2020
GROMACS modification: No
Here post your question

Dear Gromacs user,

I have encountered this problem with GROMACS 2020 but not GROMACS 2018. I have submitted a job to the HPC in my university, however, once the set time has run out on the HPC. The job keeps running as a ghost file on the HPC (the simulations weren’t finished but the time has run out on the cluster). Was there some initial environment set up that caused this to happen? Because I have talked to the adminstrator of our HPC cluster, they have no idea why GROMACS 2020 kept running even the time run out on the cluster.

Best,

Ben

Hi Ben,

This isn’t really a GROMACS question, I’m afraid. GROMACS doesn’t do anything specific with particular versions or distributions of MPI. We simply call functions in the MPI standard, which don’t depend on the distribution.

When you run out of time on a node, it’s up to the queuing system to decide what to do with jobs - such as killing them. GROMACS is quite well-behaved, and will finish up cleanly within a few seconds if you just send it a SIGHUP signal (most systems send -1, and then wait 30 seconds before SIGTERM).

My guess is that your error happens because your queuing system only kills the main process started, such as “mpirun”, but doesn’t care about all other processes a user might have started on the node (including not sending signals to them). Exactly how mpirun starts GROMACS is up to the MPI distribution, and not anything we control.

However, in general such a setup will eventually lead to a ton of stray processes on all nodes. Most clusters are rather set up to kill all processes from the user in question when their time is up.

Cheers,

Erk