I’ve tested the performance of mdrun on a gpu node using different numbers of cpu cores. I’ve noticed that the simulations do not benefit from an increased number of cores (see plot below). Is this an expected behavior when all bonded and non-bonded calculations are offloaded to the gpu? Is there a way to take advantage of the larger number of cores available?
Is this an expected behavior when all bonded and non-bonded calculations are offloaded to the gpu?
You are already offloading most of the work to the GPU. CPU still has things to do, but, unless you have some CPU-only forces in your simulations, your observations are to be expected.
Is there a way to take advantage of the larger number of cores available?
You can try using -bonded cpu to move some load back from GPU to CPU; since you have plenty CPU resources, that can speed things up.
Note that it’s quite possible that the best performance would be achieved without fully loading all CPU cores. There is granularity and overheads when distributing tasks between CPU and GPU, and we cannot achieve perfectly uniform utilization of all hardware resources in all cases.
I’ve tested using -bonded cpu, but the performance using the gpu for all calculations is way superior (more than 2x faster than with -bonded cpu). Maybe the gpu (NVIDIA Tesla V100 GPU card) is just very good at the job?