Optimizing CPU/GPU efficiency and performance in GROMACS simulations

  • -pme gpu -nb gpu -update gpu are default when a GPU is detected and the config allows it. I prefer to specify things when tuning for performance. For example, if you have constraints=all-bonds, then GROMACS will throw an error if -update gpu is spelled explicitly, but silently fall back to CPU otherwise. You decide which behavior works best for you.
  • -bonded gpu needs to be explicit; the default is cpu even if the bonded forces are supported on the GPU.
  • -pin on is the default when you are using all cores (ntmpi × ntomp = number of cores), otherwise GROMACS will cannot safely guess which cores you want to use and will not pin threads.

GROMACS 2025 behaves the same. However, GROMACS 2023 defaults to -update cpu.

A few cases when you might want to launch less threads than you have cores and definitely need pinning:

  • If you have a chiplet-based CPU, like recent AMD Zen series, you might be better off limiting GROMACS to a single chiplet (“CCX” in AMD terminology; equivalent to “LL cache” or “L3 cache” domains in various hardware-topology-reporting utilities). Check the CPU specs online to see the number of cores per chiplet, then set the -ntomp X -pin on.
  • Similarly, with P/E core distinction in recent Intel CPUs, it could be better to limit GROMACS to only P-cores. Again, look up the number of cores and set -pin on.
  • Otherwise, for single-socket machine, it should not matter that much.

You sure about that? The nstlist value from the MDP file is a minimum (unless set to 1, which actually enforces list update every step – very slow). GROMACS will set nstlist to around 100 when running on a GPU. Using the -nstlist flag is the only way to set the nstlist value to a specific, unchangeable value.

You can look for Changing nstlist line in your md2.log to see the actual value used.

GROMACS uses adaptive, dual pair list to keep the simulation physically-correct regardless of pairlist update frequency (see verlet-buffer-tolerance).

Larger nstlist means that the “outer” neighbor list is updated less often → the list must be larger to accommodate for particles moving more between updates → GPU has more work on each step. But it also means we do the slow list update (neighbor search) less often, which is why large values are good for performance when you have a fast GPU.

But it never should lead to any missing interactions (we had a nasty bug prior to GROMACS 2024 due to underestimation of pair list range, but, well, it was a bug, and only relatively uncommon simulation setups, inhomogenous and without PME, were affected).

1 Like