Unusal barostat instability when using rocm-heffte that is not present when using vkfft

GROMACS version: 2025.4
GROMACS modification: Yes: AdaptiveCPP 25.04 w/HeFFTe MPICH 3.1.32, rocm/6.4.2 & LLVM19

Hello folks,

I have been trying to run a fairly large (12M atom) system with anisotopic pressure coupling (using berendsen for the relaxation, for historical reasons) and v-rescale for the thermostat. When I run the system on 1 or 2 nodes using 8 ranks per node and 8 gpus (gfx90a’s) with my 2025.4 build based on VkFFT (with ACPP25.04) the system runs fine, but a bit slow.

In an effort to scale to a few more nodes to squeak out a bit more performance with multi-gpu PME, I made a new build of gromacs 2025.4, this time with the rocm + HeFFTe; however, as soon as I start my simulation, I run into warnings about box skewing followed by an eventual crash (typically after 2 or 3 steps) complaining about the non-bonded energies being infinite (they aren’t, from what I can tell it looks like it is a bonded term that is doing it).

Has anyone else run into issues with rocm-Heffte having barostat/energy issues? I can share the tpr and mdp files if that is helpful.

Thanks for any help or ideas of where I could start looking to hunt down what is going on.

-Mick

Hi!

Which heFFTe version are you using? Have you ran its tests?

Can you try running after export GMX_HEFFTE_USE_GPU_AWARE=0 – bad for performance, but could help with tracking down the issue.

Just checking, did you mean 25.02?