GROMACS 2025.0 on the DGX B200

rainbow · February 24, 2025, 2:48am

GROMACS version: 2025.0
GROMACS modification: No

Dear gromacs users,

I am testing GROMACS 2024.5 and 2025.0 on an NVIDIA B200 x 8 system (Ubuntu 24.04).

The simulation was tested using 8 GPUs.

System size (AMBER14SB, TIP3P)

Protein atoms: 21,912
Ligand atoms: 51
Water atoms: 212,046
Counter ions: 1

Total atoms: 234,010

.bashrc

export GMX_ENABLE_DIRECT_GPU_COMM=1
export GMX_HEFFTE_USE_GPU_AWARE=1
export GMX_GPU_PME_DECOMPOSITION=1
export GMX_USE_GPU_BUFFER_OPS=1
export GMX_DISABLE_GPU_TIMING=1
export GMX_FORCE_UPDATE_DEFAULT_GPU=true
export GMX_GPU_DD_COMMS=true
export GMX_GPU_PME_PP_COMMS=true

Command

gmx mdrun -v -deffnm md -tunepme -dlb yes -nb gpu -bonded gpu -pme gpu -npme 1 -pin on -ntmpi 8 -ntomp 16

The test results seems to be a difference in performance depending on the version.

GROMACS 2024.5 (CUDA 12.8)

no EXPORT : ~196 ns/day
with EXPORT : ~392 ns/day

GROMACS 2025.0 (CUDA 12.8)

no EXPORT : ~221 ns/day
with EXPORT : ~221 ns/day

In GROMACS 2025.0, I expected the performance to improve after applying the -export option, but it didn’t.

Any advice or insights would be greatly appreciated.

Thank you in advance.

al42and · February 24, 2025, 10:43am

Hi!

This is a known issue in 2025.0: Separate PME ranks with thread-MPI and CUDA do not work with small systems

In our testing, only small systems are affected (not your case), but the exact cause and impact are still being investigated. So, for now, the combination is disabled.

We hope to fix the problem by 2025.1 or 2025.2. Also note that this issue issue affects 2024 series as well (when you have GMX_ENABLE_DIRECT_GPU_COMM set), but it was only very recently discovered, so no notices / workaround in GROMACS 2024 yet.

Going off a tangential: the environment variables (“exports”) are used to control experimental (for one reason or the other) features. They should be used with understanding, and not by enabling everything in hopes of things getting faster. It’s not like we as developers want to make your life harder by making you jump through extra hoops to get the best performance. In particular, going through your list:

GMX_ENABLE_DIRECT_GPU_COMM=1: this one is useful for 2024, but is not needed normally for 2025: either because it’s enabled by default (with libMPI), or because it’s disabled altogether due to the aforementioned bug (with threadMPI). This is the reason you get speed up with GROMACS 2024.
GMX_HEFFTE_USE_GPU_AWARE=1: this one only makes sense if you have libMPI (“real MPI”) build with HeFFTe and run with several PME ranks. You do neither of the three, so this setting does nothing for you.
GMX_GPU_PME_DECOMPOSITION=1: same as the previous one.
GMX_USE_GPU_BUFFER_OPS=1: this can have some effect with 2024 when you are not using GPU update; in 2025, it’s on by default. And you are very likely using GPU update, so this is, again, very likely does nothing.
GMX_DISABLE_GPU_TIMING=1: GPU timings are only enabled by default with OpenCL, so this does nothing with CUDA.
GMX_FORCE_UPDATE_DEFAULT_GPU=true: this option is the default behavior since GROMACS 2023, so it does nothing.
GMX_GPU_DD_COMMS=true: was removed as an individual option a few releases ago, now controlled by GMX_ENABLE_DIRECT_GPU_COMM
GMX_GPU_PME_PP_COMMS=true: same as the previous one.

Of course, there is no harm is setting the flags that do nothing (other than them doing something when you don’t expect). But why have eight lines in your script when one is enough?

rainbow · February 24, 2025, 1:42pm

Thank you for your kind explanation.

I have a separate question regarding the use of force fields.

I’m wondering if there are any plans to update the force fields in GROMACS.

Would it be possible for GROMACS to include more recent force fields, such as AMBER19SB, in its official distribution?

Currently, only older force fields are available, which makes it quite challenging to work with.

al42and · February 24, 2025, 2:17pm

Yes. There were even plans to include AMBER 19SB in GROMACS 2025, but it did not quite work out, testing took more time than planned. Having new AMBER forcefields in GROMACS 2026 is our current target.

rainbow · February 25, 2025, 12:56am

Wow! That’s really great news!

I wish GROMACS could also handle ligands and lipids independently.

Thank you so much :)

al42and · March 4, 2025, 11:39pm

Nope-nope-nope. Forcefield parameterisation is not on the roadmap :)

What we are working on is incorporating neural-network based potentials, so one could use them to improve the accuracy of ligand modeling (as an intermediate step between classical MD and QM/MM).

We plan to make the 2025.1 release, with this issue fixed, later this week.

Topic		Replies	Views
Making efficient use of CPU and GPU in an old workstation using gmx 5.1.4 User discussions	5	656	October 13, 2021
Using GPU for gromacs User discussions	2	495	June 8, 2020
Optimizing GPU performance for GROMACS? User discussions	6	1468	January 13, 2021
-pme gpu is not working? User discussions	5	1201	January 18, 2021
Gmx_mpi GPU and HPC clusters User discussions	7	6696	November 9, 2020