Resuming run with -cpi fails when launched with mpiexec.hydra -np $SLURM_NPROCS

GROMACS version: 2020.4

Hi all, I’m stuck trying to continue a production run from a checkpoint. When I resume the run with a simple launch, it resumes correctly, but when I launch through mpiexec.hydra -np $SLURM_NPROCS (to get parallel performance under SLURM) it is “not working”. I suspect an MPI/launch/binding issue but I’m not sure what I’m doing wrong.

This command (launched directly) works and continues from the checkpoint
gmx_mpi mdrun -v -notunepme -dlb yes -resethway -s martini_v2.x_new-rf-prod.tpr
-cpi martini_v2.x_new-rf-prod_2593ns.cpt -deffnm martini_v2.x_new-rf-prod -noappend

However when I try to get parallel performance using mpiexec.hydra I see problems.

This is the command that I want to use:

time mpiexec.hydra -np $SLURM_NPROCS gmx_mpi mdrun -v -notunepme -dlb yes -resethway -s martini_v2.x_new-rf-prod.tpr -cpi martini_v2.x_new-rf-prod_2593ns.cpt

Any help, suggestions, or insights on this issue would be greatly appreciated.Thanks in advance.