Efficiency on running multiple tasks on 1 gpu node

GROMACS version:GROMACS-2022.3
GROMACS modification: No
Dear gromacs community,
I’m trying to submit multiple gromacs tasks on my cluster with 32 cores in one GPU node.
However, the running efficencies seem to vary a lot between different tasks (0.6 h/ns to 6 h/ns)
I’ve tried to detect what’s the difference between high efficiency task and low efficiency task via their log files and I found that in high efficiency task’s log file, I can see:
Running on 1 node with total 32 cores, 64 processing units, 1 compatible GPU
but in low efficiency tasks, the sentence will be:
Running on 1 node with total 32 cores, 32 processing units, 1 compatible GPU

it seems the efficiency difference was caused by different number of processing units assigned to the task, but I’m using same mdrun command for all tasks:
gmx mdrun -ntmpi 1 -ntomp 4 -deffnm us -v -nb gpu -bonded gpu -pme gpu

Any comments or suggestions are welcome,
many thanks.
XIA

Hi XIA,

how many of these simulations (where each uses 4 threads) do you run on such a node?
I would assume that a proper pinning strategy would lead to better performance.

Best,
Carsten

Hi Carsten,
I’m submitting 8 simulations to each node, I’ve also noticed that gromacs has warned me in the log file that
NOTE: The number of threads is not equal to the number of (logical) cpus
** and the -pin option is set to auto: will not pin threads to cpus.**
** This can lead to significant performance degradation.**
** Consider using -pin on (and -pinoffset in case you run multiple jobs).**

but I’m not sure how can i apply appropriate pinning strategies to those tasks, can u kindly give me some suggestions or examples to do that?

Regards,
XIA

Hi XIA,

I find it easiest to let GROMACS do the proper pinning for me by starting all simulations at once in a multidir setting. For that, you will need an MPI-enabled gmx executable. Then you put your 8 individual input .tpr files in 8 separate directories, e.g. called A B C D E F G and H. Then you run

mpirun -n 8 gmx_mpi mdrun -ntomp 4 -multidir A B C D E F G H -pin on -s topol.tpr ...

The input files must all have the same name in the subdirectories, but this you can achieve, e.g. with symbolic links.

Best,
Carsten

Hi Carsten,
I followed ur suggestion and submit tasks via multidir setting and the efficiencies of them seem good enough now, many thanks for ur kind suggestions and precious time.

Regards,
XIA

You are welcome!

Hi XIA! Glad that your problem is solved!

Could you please share the output of gmx_mpi -version and the description of the hardware on your node? Does it use one of the recent Intel CPUs?

Hi,
My gmx_mpi version is 2022.3 and CPU model in my node is Intel(R) Xeon(R) Gold 5218.

Regards,
XIA