GROMACS 2025.3 CUDA 13 RTX 5070 Ti Fails on Simulations

GROMACS version: 2025.3
GROMACS modification: No
I am having issue while running gromacs with RTX 5070 Ti and CUDA 13

GPU support: CUDA
NBNxM GPU setup: super-cluster 2x2x2 / cluster 8 (cluster-pair splitting on)
GPU FFT library: cuFFT
Multi-GPU FFT: none
CUDA compiler: /usr/local/cuda-13.0/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2025 NVIDIA Corporation;Built on Wed_Aug_20_01:58:59_PM_PDT_2025;Cuda compilation tools, release 13.0, V13.0.88;Build cuda_13.0.r13.0/compiler.36424714_0
CUDA compiler flags: -O3 -DNDEBUG
CUDA driver: 13.0
CUDA runtime: 13.0

NVIDIA GPU Status:
GPU 0: NVIDIA GeForce RTX 5070 Ti

CUDA Driver Version:
580.65.06

I have compiled gromacs with cuda 13 toolkit on my environment and using it.
Ubuntu 24.4 x64

running these commands

Command line:
gmx pdb2gmx -f protein_raw.pdb -o protein.gro -p topol.top -i posre.itp -ff amber99sb-ildn -water tip3p -ignh

Command line:
gmx pdb2gmx -f protein_raw.pdb -o protein.gro -p topol.top -i posre.itp -ff amber99sb-ildn -water tip3p -ignh

Command line:
gmx pdb2gmx -f protein_raw.pdb -o protein.gro -p topol.top -i posre.itp -ff amber99sb-ildn -water tip3p -ignh

Command line:
gmx pdb2gmx -f protein_raw.pdb -o protein.gro -p topol.top -i posre.itp -ff amber99sb-ildn -water tip3p -ignh

Command line:
gmx grompp -f minim.mdp -c solv.gro -p topol.top -o ions.tpr -maxwarn 2

Command line:
gmx genion -s ions.tpr -o solv_ions.gro -p topol.top -pname NA -nname CL -neutral -conc 0.15

Command line:
gmx mdrun -deffnm nvt -ntmpi 1 -ntomp 8 -nb gpu -pme gpu -bonded gpu -update gpu

Reading file nvt.tpr, VERSION 2025.3 (single precision)
Changing nstlist from 10 to 100, rlist from 1 to 1.166

1 GPU selected for this run.
Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node:
PP:0,PME:0
PP tasks will do (non-perturbed) short-ranged and most bonded interactions on the GPU
PP task will update and constrain coordinates on the GPU
PME tasks will do all aspects on the GPU
Using 1 MPI thread
Using 8 OpenMP threads

NOTE: The number of threads is not equal to the number of (logical) cpus
and the -pin option is set to auto: will not pin threads to cpus.
This can lead to significant performance degradation.
Consider using -pin on (and -pinoffset in case you run multiple jobs).
starting mdrun ‘Protein in water’
50000 steps, 100.0 ps.


Program: gmx mdrun, version 2025.3
Source file: src/gromacs/fft/gpu_3dfft_cufft.cu (line 59)

Fatal error:
cufftPlanMany R2C plan failure (error code 5)

For more information and tips for troubleshooting, please check the GROMACS

website at Common errors when using GROMACS - GROMACS 2025.4 documentation

================================================================================

[ERROR] Simulation exited with code 1

and it fails with this error on the last command.

only when i disable gpu for the last task it succeed but it is taking longer than just using cpu for all tasks so gpu is useless in this case.

cpu only run ———
Environment Setup : 1s

GPU Configuration Check : 1s

Ligand/Protein Extraction : 0s

pdb2gmx (Protein Topology) : 0s

ACPYPE (Ligand Topology) : 5s

Topology Integration : 0s

System Preparation (Box/Solvate) : 0s

Ion Placement : 0s

Energy Minimization : 3s

NVT Equilibration : 17s

NPT Equilibration : 17s

Production MD : 3s

Metrics Analysis : 0s

Plotting (XVG to PNG) : 1s

TOTAL RUNTIME : 48s

——— gpu enabled but safe mode

Environment Setup : 1s

GPU Configuration Check : 14s

Ligand/Protein Extraction : 0s

pdb2gmx (Protein Topology) : 0s

ACPYPE (Ligand Topology) : 13s

Topology Integration : 0s

System Preparation (Box/Solvate) : 0s

Ion Placement : 0s

Energy Minimization : 11s

NVT Equilibration : 39s

NPT Equilibration : 39s

Production MD : 12s

Metrics Analysis : 0s

Plotting (XVG to PNG) : 1s

TOTAL RUNTIME : 2m 10s

I have configured the python script in 3 modes but except safe mode others are failing and in safe mode its slower than bare cpu as shown above. I am not sure what is the issue

safe mode configures GPU acceleration to use only the non-bonded force calculations on the GPU, while explicitly forcing PME (Particle Mesh Ewald) calculations to run on the CPU.

What Safe Mode Does:

  1. Energy Minimization & MD runs: Uses -nb gpu -pme cpu flags

    • -nb gpu → Non-bonded interactions (van der Waals, direct-space electrostatics) run on GPU

    • -pme cpu → PME electrostatics (reciprocal space) run on CPU

  2. Why it exists:

    • Avoids cuFFT library issues that can occur with CUDA 13.x and newer GPU architectures (RTX 40xx/50xx)

    • Prevents PTX JIT compilation errors related to PME calculations

    • Maximum compatibility across all CUDA versions and GPU models

  3. Performance tradeoff:

    • Slower than compat or full modes

    • But guaranteed to work without runtime errors

    • Still much faster than pure CPU mode since non-bonded calculations (usually the most expensive part) run on GPU

Comparison with other modes:

  • safe: -nb gpu -pme cpu (both EM and MD)

  • compat (default): -nb gpu -pme cpu (EM), -nb gpu -pme gpu (MD)

  • full: -nb gpu -bonded gpu -pme gpu -update gpu (everything on GPU)

The script recommends using compat mode for the Fedora build, which uses safe settings for energy minimization but enables GPU PME for production MD to balance stability and performance.

What might be the issue ?

You seem to be referring to some python script that runs GROMACS commands in a couple of different “modes”, and I guess you copied a part of the scripts documentation, because apparently “safe mode” was added to prevent the exact error you’re seeing:

“Avoids cuFFT library issues that can occur with CUDA 13.x and newer GPU architectures (RTX 40xx/50xx)“

Cf. the error message you’re getting when not running in “safe mode”:

Fatal error:
cufftPlanMany R2C plan failure (error code 5)

So I guess whoever wrote the script was aware of the issues that you’re seeing, and they might know more. My guess is that this is not an issue on the GROMACS side, but a compatibility issue between cuFFT, CUDA and GPU versions.

Well I have prepared the script using LLM tools, I am not an expert on configuring gromacs for simulations and yeah you are right about the issue but what i mean mostly is how come GPU enabled compute takes longer than only CPU one.

I am trying to solve the CUDA cuFFT issues by using different RTX GPU but could not succeed yet. This time i am facing compile issue for gromacs.
Trying to compile it for cuda 12.6