-pme gpu -nb gpu -update gpu
are default when a GPU is detected and the config allows it. I prefer to specify things when tuning for performance. For example, if you haveconstraints=all-bonds
, then GROMACS will throw an error if-update gpu
is spelled explicitly, but silently fall back to CPU otherwise. You decide which behavior works best for you.-bonded gpu
needs to be explicit; the default iscpu
even if the bonded forces are supported on the GPU.-pin on
is the default when you are using all cores (ntmpi × ntomp = number of cores), otherwise GROMACS will cannot safely guess which cores you want to use and will not pin threads.
GROMACS 2025 behaves the same. However, GROMACS 2023 defaults to -update cpu
.
A few cases when you might want to launch less threads than you have cores and definitely need pinning:
- If you have a chiplet-based CPU, like recent AMD Zen series, you might be better off limiting GROMACS to a single chiplet (“CCX” in AMD terminology; equivalent to “LL cache” or “L3 cache” domains in various hardware-topology-reporting utilities). Check the CPU specs online to see the number of cores per chiplet, then set the
-ntomp X -pin on
. - Similarly, with P/E core distinction in recent Intel CPUs, it could be better to limit GROMACS to only P-cores. Again, look up the number of cores and set
-pin on
. - Otherwise, for single-socket machine, it should not matter that much.
You sure about that? The nstlist
value from the MDP file is a minimum (unless set to 1
, which actually enforces list update every step – very slow). GROMACS will set nstlist
to around 100 when running on a GPU. Using the -nstlist
flag is the only way to set the nstlist
value to a specific, unchangeable value.
You can look for Changing nstlist
line in your md2.log
to see the actual value used.
GROMACS uses adaptive, dual pair list to keep the simulation physically-correct regardless of pairlist update frequency (see verlet-buffer-tolerance
).
Larger nstlist
means that the “outer” neighbor list is updated less often → the list must be larger to accommodate for particles moving more between updates → GPU has more work on each step. But it also means we do the slow list update (neighbor search) less often, which is why large values are good for performance when you have a fast GPU.
But it never should lead to any missing interactions (we had a nasty bug prior to GROMACS 2024 due to underestimation of pair list range, but, well, it was a bug, and only relatively uncommon simulation setups, inhomogenous and without PME, were affected).