GPU likely dies out mid-simulation, causing long run times

evanc · August 13, 2024, 2:26am

GROMACS version: 2023.3 with CUDA 12.3
GROMACS modification: No

Hi all,

Whenever I try to run a production run simulation of a bilayer from CHARMM-GUI, for the first 10 hours (for what is projected to be a 30-hour run time), everything runs efficiently. Then, at about the 10 hour mark, the estimated time ticks up about 1 second every 1.5 seconds (thus, the simulation would then take a week+ to complete).

I have had a similar issue before, though it was for a simpler ~7 hour simulation. The cause was that the GPU was crashing out mid-simulation. The solution was to add the -update and -nstlist 400 arguments to mdrun (e.g., gmx mdrun -s test.tpr -v -x test.xtc -c test.gro -nb gpu -bonded gpu -pme gpu -update gpu -nstlist 400). However, this is not fixing my current simulation, perhaps because this one is an even longer duration and the problem kicks in around hour 10.

Does anyone have advice on how to ensure your GPU doesn’t crash out mid-simulation? I have a RTX 4090 24GB, Intel i9-13900K, and 64GB RAM.

In case of value, here is the .mdp I am running and terminal command.

integrator = md
dt = 0.004
nsteps = 250000000
nstxout-compressed = 25000
nstxout = 0
nstvout = 0
nstfout = 0
nstcalcenergy = 100
nstenergy = 1000
nstlog = 1000
;
cutoff-scheme = Verlet
nstlist = 400
rlist = 1.2
vdwtype = Cut-off
vdw-modifier = Force-switch
rvdw_switch = 1.0
rvdw = 1.2
coulombtype = PME
rcoulomb = 1.2
;
tcoupl = v-rescale
tc_grps = MEMB SOLV
tau_t = 1.0 1.0
ref_t = 303.15 303.15
;
pcoupl = C-rescale
pcoupltype = semiisotropic
tau_p = 5.0
compressibility = 4.5e-5 4.5e-5
ref_p = 1.0 1.0
;
constraints = h-bonds
constraint_algorithm = LINCS
continuation = yes
;
nstcomm = 100
comm_mode = linear
comm_grps = MEMB SOLV

gmx mdrun -v -deffnm ${istep} -nb gpu -bonded gpu -pme gpu -update gpu -nstlist 400

Thanks in advance!

Topic		Replies	Views
Simulation time drastically slows down midway through User discussions mdrun , gpu	4	399	January 19, 2025
Simulation stops and gmx hangs User discussions	0	303	November 14, 2023
GROMACS mdrun on GPU has become very slow recently User discussions mdrun , gpu	0	462	February 9, 2022
Splitting 100ns Gromacs runs into 25ns smaller chunks User discussions mdrun , simulation-setup	0	35	May 19, 2025
Gromacs freezes during REMD simulation User discussions	0	114	April 14, 2024

GPU likely dies out mid-simulation, causing long run times

Related topics