Performance Fluctuations and Bonded Interactions on GPU in GROMACS 2024.1

GROMACS version: GROMACS 2024.1
GROMACS modification: No

Hello GROMACS community,

I am currently using GROMACS 2024.1, which includes the GPU offloading feature for tasks like non-bonded interactions and PME. However, I’ve noticed some fluctuations in ns/day during production runs, particularly related to “Wait GPU state copy” times.

From what I understand, these fluctuations might be due to CPU-GPU communication delays and the dynamic task allocation between the CPU and GPU. I suspect that offloading bonded interactions to the GPU could be contributing to these delays, as the task size for bonded interactions is relatively small and doesn’t benefit much from GPU offloading.

To address this, I’ve tried using the -bonded cpu option to force bonded interactions to be calculated on the CPU, which seemed to stabilize performance. My assumption is that, especially for larger systems, keeping bonded interactions on the CPU is more efficient since the calculation load for bonded interactions doesn’t scale significantly with system size, and moving them to the GPU only adds unnecessary waiting time.

Could you confirm if my understanding is correct, or suggest other potential reasons for the fluctuations? Are there best practices for handling bonded interactions and optimizing GPU performance in larger systems?

Thank you!

Are you performing update on the CPU or GPU? If update is on CPU it is can be better to compute bonded interactions on the CPU, especially when they are just a few. With update on GPU, which I assume you don’t have, moving them to the CPU adds extra communication. This could still be beneficial if the CPU is anyhow idling.