Gputask assignment on dual-socket node

wrmartin · October 28, 2021, 10:16pm

GROMACS version: 2021.3
GROMACS modification: No

Hello,
I’m trying to run GROMACS on a dual-socket node (24 cores, 48 threads per socket) with 4 GPUs. However, the logical core topology seems a little wonky; the even processor IDs correspond to physical ID 0, while the odds correspond to physical ID 1. I’m enabling GPU buffer ops, GPU halo exchange, and GPU PME-PP comms. The best performance I’ve been able to achieve is using the following mdrun flags:
-nb gpu -bonded gpu -pme gpu -npme 1 -ntmpi 7 -ntomp 7 -pin on -pinstride 2 -ntomp_pme 6 -nstlist 400 -gputasks 0011223

The problem is this excludes an entire socket; I cannot figure out how to assign the tasks properly so each GPU is only being used by one socket. Is there a way to properly assign tasks so I can use both sockets?

Thanks,

Will Martin

pszilard · November 5, 2021, 3:47pm

Hi,

Do you mean that the cores of a socket are not used? That you control by the thread count and affinity settings.

If you started 7 ranks 67 threads + 16 = 48 you have assigned threads to all cores. However, you explicitly request a stride of 2 which will not work as you only have 48 threads in total (you’d need at least 2x48 for that to work).

That said, this may not be the most efficient assignment depending on your CPU topology. Also note that if your PME rank offloads everything to the GPU, it does not need (nor can it use right now) more than a single core.

Cheers,
Szilard

wrmartin · November 6, 2021, 12:56pm

I understand why the way I’m assigning things doesn’t use both sockets, but I can’t figure out a way to make it use both sockets while not being a performance loss. If a thread uses cores fromboth sockets it results in a performance loss; is there a way to force threads to use every other core while still using all cores? So still using a stride, but combining an offset for the second “set” of threads? So for a basic example:

-ntmpi 16 -ntomp 6 -gputasks 0000111122223333

But where the first 8 tmpi threads only use even processor IDs and the second 8 use odds?

As for the PME, that’s good to know; I don’t have anything I can have those other 5 cores do in this case, but would it be better to just assign 1 core for PME here?

Topic		Replies	Views
Tweak -multidir performance on a dual-socket, 8 GPU server User discussions	6	560	November 17, 2020
Alchemical calculations on GPU User discussions mdrun , gpu , free-energy	5	429	October 18, 2023
Efficiency on running multiple tasks on 1 gpu node User discussions mdrun	7	659	July 6, 2023
Set MPI GROMACS to run on a certain number of CPU cores/threads User discussions	5	8495	September 8, 2020
Offloading NB and PME to GPUs in multi-sim run User discussions mdrun , gpu , simulation-setup	1	200	January 22, 2024

Gputask assignment on dual-socket node

Related topics