GROMACS version: 2020.4
Hi all, I’m stuck trying to continue a production run from a checkpoint. When I resume the run with a simple launch, it resumes correctly, but when I launch through mpiexec.hydra -np $SLURM_NPROCS (to get parallel performance under SLURM) it is “not working”. I suspect an MPI/launch/binding issue but I’m not sure what I’m doing wrong.
This command (launched directly) works and continues from the checkpoint
gmx_mpi mdrun -v -notunepme -dlb yes -resethway -s martini_v2.x_new-rf-prod.tpr
-cpi martini_v2.x_new-rf-prod_2593ns.cpt -deffnm martini_v2.x_new-rf-prod -noappend
However when I try to get parallel performance using mpiexec.hydra I see problems.
This is the command that I want to use:
time mpiexec.hydra -np $SLURM_NPROCS gmx_mpi mdrun -v -notunepme -dlb yes -resethway -s martini_v2.x_new-rf-prod.tpr -cpi martini_v2.x_new-rf-prod_2593ns.cpt
Any help, suggestions, or insights on this issue would be greatly appreciated.Thanks in advance.