GROMACS 2025.0 on the DGX B200

GROMACS version: 2025.0
GROMACS modification: No

Dear gromacs users,

I am testing GROMACS 2024.5 and 2025.0 on an NVIDIA B200 x 8 system (Ubuntu 24.04).

The simulation was tested using 8 GPUs.

System size (AMBER14SB, TIP3P)

  • Protein atoms: 21,912
  • Ligand atoms: 51
  • Water atoms: 212,046
  • Counter ions: 1

  • Total atoms: 234,010

.bashrc

export GMX_ENABLE_DIRECT_GPU_COMM=1
export GMX_HEFFTE_USE_GPU_AWARE=1
export GMX_GPU_PME_DECOMPOSITION=1
export GMX_USE_GPU_BUFFER_OPS=1
export GMX_DISABLE_GPU_TIMING=1
export GMX_FORCE_UPDATE_DEFAULT_GPU=true
export GMX_GPU_DD_COMMS=true
export GMX_GPU_PME_PP_COMMS=true

Command

  • gmx mdrun -v -deffnm md -tunepme -dlb yes -nb gpu -bonded gpu -pme gpu -npme 1 -pin on -ntmpi 8 -ntomp 16

The test results seems to be a difference in performance depending on the version.

GROMACS 2024.5 (CUDA 12.8)

  • no EXPORT : ~196 ns/day
  • with EXPORT : ~392 ns/day

GROMACS 2025.0 (CUDA 12.8)

  • no EXPORT : ~221 ns/day
  • with EXPORT : ~221 ns/day

In GROMACS 2025.0, I expected the performance to improve after applying the -export option, but it didn’t.

Any advice or insights would be greatly appreciated.

Thank you in advance.

Hi!

This is a known issue in 2025.0: Separate PME ranks with thread-MPI and CUDA do not work with small systems

In our testing, only small systems are affected (not your case), but the exact cause and impact are still being investigated. So, for now, the combination is disabled.

We hope to fix the problem by 2025.1 or 2025.2. Also note that this issue issue affects 2024 series as well (when you have GMX_ENABLE_DIRECT_GPU_COMM set), but it was only very recently discovered, so no notices / workaround in GROMACS 2024 yet.


Going off a tangential: the environment variables (“exports”) are used to control experimental (for one reason or the other) features. They should be used with understanding, and not by enabling everything in hopes of things getting faster. It’s not like we as developers want to make your life harder by making you jump through extra hoops to get the best performance. In particular, going through your list:

  • GMX_ENABLE_DIRECT_GPU_COMM=1: this one is useful for 2024, but is not needed normally for 2025: either because it’s enabled by default (with libMPI), or because it’s disabled altogether due to the aforementioned bug (with threadMPI). This is the reason you get speed up with GROMACS 2024.
  • GMX_HEFFTE_USE_GPU_AWARE=1: this one only makes sense if you have libMPI (“real MPI”) build with HeFFTe and run with several PME ranks. You do neither of the three, so this setting does nothing for you.
  • GMX_GPU_PME_DECOMPOSITION=1: same as the previous one.
  • GMX_USE_GPU_BUFFER_OPS=1: this can have some effect with 2024 when you are not using GPU update; in 2025, it’s on by default. And you are very likely using GPU update, so this is, again, very likely does nothing.
  • GMX_DISABLE_GPU_TIMING=1: GPU timings are only enabled by default with OpenCL, so this does nothing with CUDA.
  • GMX_FORCE_UPDATE_DEFAULT_GPU=true: this option is the default behavior since GROMACS 2023, so it does nothing.
  • GMX_GPU_DD_COMMS=true: was removed as an individual option a few releases ago, now controlled by GMX_ENABLE_DIRECT_GPU_COMM
  • GMX_GPU_PME_PP_COMMS=true: same as the previous one.

Of course, there is no harm is setting the flags that do nothing (other than them doing something when you don’t expect). But why have eight lines in your script when one is enough?

1 Like

Thank you for your kind explanation.

I have a separate question regarding the use of force fields.

I’m wondering if there are any plans to update the force fields in GROMACS.

Would it be possible for GROMACS to include more recent force fields, such as AMBER19SB, in its official distribution?

Currently, only older force fields are available, which makes it quite challenging to work with.

Yes. There were even plans to include AMBER 19SB in GROMACS 2025, but it did not quite work out, testing took more time than planned. Having new AMBER forcefields in GROMACS 2026 is our current target.

Wow! That’s really great news!

I wish GROMACS could also handle ligands and lipids independently.

Thank you so much :)

Nope-nope-nope. Forcefield parameterisation is not on the roadmap :)

What we are working on is incorporating neural-network based potentials, so one could use them to improve the accuracy of ligand modeling (as an intermediate step between classical MD and QM/MM).

We plan to make the 2025.1 release, with this issue fixed, later this week.