Freeing of the device buffer failed. CUDA error #700

GROMACS version: 2024.2
GROMACS modification: No
System Information:
Operating System: Linux Mint 21.3 Cinnamon (Ubuntu 22.04)
GCC default is 11.4.0 but i installed gcc and g++ version 12
CUDA Version: CUDA 12.5 Toolkit was installed per NVIDIA Instructions
GPU Driver Version: 550.67 Installed using LM21.3 Driver Manager
GPU : NVIDIA GeForce RTX 4070 Ti SUPER
CPU : 12th Gen Intel(R) Core™ i7-12700K - 12/20 core

installation and check all work fine, used
cmake … -DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=ON -DGMX_GPU=CUDA

The following error occurs when running a NPT equilibration after successful energy minimization and NVT equilibration runs. I have searched the GROMACS run-time-errors as listed, no help there. I have searched this forum, a few similar errors but no solutions and mine is a bit different.


Program: gmx mdrun, version 2024.2
Source file: src/gromacs/gpu_utils/devicebuffer.cuh (line 111)
Function: freeDeviceBuffer<gmx::BasicVector*>(gmx::BasicVector**)::<lambda()>

Assertion failed:
Condition: stat == cudaSuccess
Freeing of the device buffer failed. CUDA error #700
(cudaErrorIllegalAddress): an illegal memory access was encountered.

For more information and tips for troubleshooting, please check the GROMACS
website at https://manual.gromacs.org/current/user-guide/run-time-errors.html


gmx mdrun -v -nt 8 -deffnm eq2_npt_500K

Input Details:
; Run parameters
integrator = md ; leap-frog integrator
dt = 0.002 ; time step length
nsteps = 1250000 ; number of steps
continuation = yes

; Output control
nstenergy = 500 ; steps between saving energies
nstlog = 500 ; steps between saving log

; Bond parameters
constraint_algorithm = lincs
constraints = h-bonds ; H bonds constrained

; Neighbor searching
cutoff-scheme = Verlet
rvdw = 1.2 ; short-range van der Waals cutoff (in nm)
rlist = 1.2 ; Short-range neighbor list
vdwtype = Cut-off
DispCorr = EnerPres

; Electrostatics
coulombtype = PME
rcoulomb = 1.2

; Temperature coupling
tcoupl = v-rescale ; modified C-rescale thermostat
tc-grps = system
tau_t = 0.1 ; time constant, in ps
ref_t = 500 ; reference temperature in K

; Pressure coupling
pcoupl = C-rescale
pcoupltype = isotropic
tau-p = 5.0
ref-p = 1.0
compressibility = 4.5e-5

; Periodic boundary conditions
pbc = xyz ; 3-D PBC

Note: -nt 8 yielded far better performance than 20 (auto) so apparently over-parallelization was a significant performance killer.

I have successfully run a simulation sequence using a box of 1000 methanol molecules with

energy minimization > NVT 325K equilibration > NPT 325K equilibration > NPT 315K equilibration > 298K equilibration > Density at 298K > Production Run at 298K

with zero errors using the same input steps as above. My collaborator ran the identical simulation with GROMACS 2024.2 using the identical input set for all temperature with zero errors. He uses a RTX 3060 gpu and is suggesting I have a CUDA/GPU issue. Neither of us can track down the error since there are no other warnings or errors before this crashes. The time and frequency of the crash can vary greatly. sometimes I can get through 4 temperature equilibrations, density, and then it crashes in production. I don’t really want to step backwards and buy a RTX 3060 gpu.

It was also suggested I install the lowest version of CUDA that would work with my RTX 4070 Ti SUPER but that doesn’t really make sense since it is a very new card and the most up-to-date versions should be used (I would think).

I would greatly appreciate any insights. Please help!!

Additional Thought/Question:

  1. By not specifying the specific gcc and g++ version in cmake, 12.3.0, and the specific path to the CUDA Toolkit, did GROMACS use the default gcc 11.4.0 version? I only have 1 Toolkit version so that seems ok.

Extra Information - successful NVT 500K run just before NPT run that crashed

gmx mdrun -v -nt 8 -deffnm eq1_nvt_500K

                  :-) GROMACS - gmx mdrun, 2024.2 (-:

Executable: /usr/local/gromacs/bin/gmx
Data prefix: /usr/local/gromacs
Working dir: /home/mph/des/ChCl_etgly_343K/eq1_nvt_500K/ini
Command line:
gmx mdrun -v -nt 8 -deffnm eq1_nvt_500K

Reading file eq1_nvt_500K.tpr, VERSION 2024.2 (single precision)
Changing nstlist from 10 to 80, rlist from 1.2 to 1.395

1 GPU selected for this run.
Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node:
PP:0,PME:0
PP tasks will do (non-perturbed) short-ranged interactions on the GPU
PP task will update and constrain coordinates on the GPU
PME tasks will do all aspects on the GPU
Using 1 MPI thread
Using 8 OpenMP threads

NOTE: The number of threads is not equal to the number of (logical) cpus
and the -pin option is set to auto: will not pin threads to cpus.
This can lead to significant performance degradation.
Consider using -pin on (and -pinoffset in case you run multiple jobs).
starting mdrun ‘simbox’
250000 steps, 500.0 ps.
step 37520: timed with pme grid 48 48 48, coulomb cutoff 1.200: 38.4 M-cycles
step 37680: timed with pme grid 42 42 42, coulomb cutoff 1.310: 39.1 M-cycles
step 37840: timed with pme grid 36 36 36, coulomb cutoff 1.528: 40.2 M-cycles
step 38000: timed with pme grid 32 32 32, coulomb cutoff 1.719: 46.8 M-cycles
step 38160: timed with pme grid 36 36 36, coulomb cutoff 1.528: 40.2 M-cycles
step 38320: timed with pme grid 40 40 40, coulomb cutoff 1.375: 39.3 M-cycles
step 38480: timed with pme grid 42 42 42, coulomb cutoff 1.310: 39.0 M-cycles
step 38640: timed with pme grid 44 44 44, coulomb cutoff 1.250: 39.1 M-cycles
step 38800: timed with pme grid 48 48 48, coulomb cutoff 1.200: 38.0 M-cycles
optimal pme grid 48 48 48, coulomb cutoff 1.200
step 249900, remaining wall clock time: 0 s
Writing final coordinates.
step 250000, remaining wall clock time: 0 s
Core t (s) Wall t (s) (%)
Time: 279.789 34.975 800.0
(ns/day) (hour/ns)
Performance: 1235.164 0.019

GROMACS reminds you: “Does All This Money Really Have To Go To Charity ?” (Rick)

Any update on this issue?

You can reduce the dt value to 0.001. I just solved my problem by reducing dt from 0.002 to 0.001.