Hybrid MPI and OpenMP

akashbanerjee · April 5, 2021, 2:03pm

GROMACS version: 2020
GROMACS modification: Yes

Hi Gromacs Developers,
On an HPC, I am trying to use 1 full node that has 128 cores. I have a small system (444 nm3), hence I cannot allocate every core to an MPI process - 128 MPI processes (with 1 openMP thread per process). At this point, I have 2 questions:

This may seem very basic, but I still want to ask: I am using mpirun -np 32, where in I get the following configuration :
Using 32 MPI processes
Non-default thread affinity set, disabling internal thread affinity
Using 4 OpenMP threads per MPI process

Does this mean I am using the full 128 cores (32*4) ? I just want to make sure I am not wasting any resources on the node. This is because I will be charged for the entire node.

Is there a better way to do this process on a HPC? Specially when I have a small system and I have to use 128 cores.

Thank you for your help.

Kind Regards,
Akash

MichelePellegrino · April 6, 2021, 12:17pm

Hi Akash,

I have a limited experience with running GROMACS on HPC, but from what I managed to learn I am pretty sure that you would end up using 128 threads and not all 128 cores (the number of cores would depent on the numer of threads per core, which is generally > 2).

Point is: is it good or bad? A priori, I would not say that using only part of the resources on the node is a waste: if benchmarking shows that 32 MPI processes are optimal for you system, then adding more would only make you pay more in terms of core hours, wouldn’t it?

Carsten · April 8, 2021, 1:23pm

If your compute node has 128 hardware threads, i.e. 64 physical cores, you can also try to only run on 64 cores with

mpirun -np 16 -ntomp 4 or
mpirun -np 32 -ntomp 2 or
mpirun -np 8 -ntomp 8

Especially for a small system this could give you a performance benefit.

To generally make better use of the hardware, you can run multiple .tpr files at the same time using GROMACS’s built-in multidir functionality, as described below

https://manual.gromacs.org/current/user-guide/mdrun-features.html

pszilard · April 12, 2021, 3:13pm

Additionally to the above suggestions, note that per the above message, mdrun detected externally set thread affinities and will honor these. However, if you job scheduler / MPI launcher did not set a correct process/thread affinity, you could end with suboptimal performance. E.g. if you did not tell your job scheduler that each MPI task is intended to use 4 cores, you may end up with 32 MPI tasks each assigned a single core, but each core will be oversubscribed running 4 threads and you’ll leave 3*32=96 cores empty.

Make sure your job launch is correct or use mdrun -pin on.

Topic		Replies	Views
Set MPI GROMACS to run on a certain number of CPU cores/threads User discussions	5	8697	September 8, 2020
Gmx_mpi run error User discussions	0	342	November 18, 2020
GMX_MPI running User discussions	1	696	October 20, 2023
Getitng best performance in parallel User discussions	2	470	December 31, 2020
Clarification cores vs logical cores User discussions mdrun	2	568	July 5, 2021

Hybrid MPI and OpenMP

Related topics