Optimizing number of cpu cores in a gpu node run

GROMACS version: 2023.2
GROMACS modification: No

Dear GMX users,

I’ve tested the performance of mdrun on a gpu node using different numbers of cpu cores. I’ve noticed that the simulations do not benefit from an increased number of cores (see plot below). Is this an expected behavior when all bonded and non-bonded calculations are offloaded to the gpu? Is there a way to take advantage of the larger number of cores available?

Here are the commands that I’ve used:

export OMP_NUM_THREADS=40 #Vary this number

export GMX_ENABLE_DIRECT_GPU_COMM=1

export GMX_FORCE_UPDATE_DEFAULT_GPU=true

mpirun -np 1 gmx_mpi mdrun -deffnm md_0_1 -nb gpu -bonded gpu -pme gpu -resetstep 90000 -noconfout

Best regards,

Gustavo

Hi!

Is this an expected behavior when all bonded and non-bonded calculations are offloaded to the gpu?

You are already offloading most of the work to the GPU. CPU still has things to do, but, unless you have some CPU-only forces in your simulations, your observations are to be expected.

Is there a way to take advantage of the larger number of cores available?

You can try using -bonded cpu to move some load back from GPU to CPU; since you have plenty CPU resources, that can speed things up.

Note that it’s quite possible that the best performance would be achieved without fully loading all CPU cores. There is granularity and overheads when distributing tasks between CPU and GPU, and we cannot achieve perfectly uniform utilization of all hardware resources in all cases.

Dear al42and,

Thanks for the reply.

I’ve tested using -bonded cpu, but the performance using the gpu for all calculations is way superior (more than 2x faster than with -bonded cpu). Maybe the gpu (NVIDIA Tesla V100 GPU card) is just very good at the job?

Best regards,

Gustavo