Production MD stop in 23829100 step

GROMACS version: 2025.3
GROMACS modification: Yes/No
Hi,

I run production MD for 100 ns. After about 6 hours, it stops.

This is the error I get:

1 GPU selected for this run.

Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node: PP:0,PME:0

PP tasks will do (non-perturbed) short-ranged interactions on the GPU

PP task will update and constrain coordinates on the GPU

PME tasks will do all aspects on the GPU

Using 1 MPI thread

Using 16 OpenMP threads

WARNING: This run will generate roughly 3918 Mb of data

starting mdrun ‘Protein in water’

50000000 steps, 100000.0 ps.

step 5800: timed with pme grid 72 72 72, coulomb cutoff 1.200: 396.1 M-cycles

step 6000: timed with pme grid 60 60 60, coulomb cutoff 1.333: 464.0 M-cycles

step 6200: timed with pme grid 64 64 64, coulomb cutoff 1.250: 418.2 M-cycles

step 6400: timed with pme grid 72 72 72, coulomb cutoff 1.200: 393.8 M-cycles

optimal pme grid 72 72 72, coulomb cutoff 1.200

step 23829100, will finish Mon Nov 10 13:04:23 2025/home/polarbear/gromacs/src/gromacs/ewald/pme_gpu_calculate_splines.cuh:128: void assertIsFinite(T) [with T = float3]: block: [914,0,0], thread: [2,0,10] Assertion isfinite(static_cast<float>(arg.x)) failed. /home/polarbear/gromacs/src/gromacs/ewald/pme_gpu_calculate_splines.cuh:128: void assertIsFinite(T) [with T = float3]: block: [914,0,0], thread: [3,0,10] Assertion isfinite(static_cast<float>(arg.x)) failed. /home/polarbear/gromacs/src/gromacs/ewald/pme_gpu_calculate_splines.cuh:128: void assertIsFinite(T) [with T = float3]: block: [914,0,0], thread: [0,0,11] Assertion isfinite(static_cast<float>(arg.x)) failed.

terminate called after throwing an instance of ‘gmx::InternalError’

what(): Freeing of the device buffer failed. CUDA error #710 (cudaErrorAssert): device-side assert triggered. Aborted (core dumped)

I will appreciate it if you guide me.

You have a coordinate that is infinite or not a number. I don’t know if this is due to an instability in your system or due to a bug in GROMACS. Do the energies and box volume look normal just before the crash?

Thank you.

I repeated the production MD. This time, the run finished successfully. I checked various parameters before production MD. All things were ok.

Also, I did not find any unusual before the crash.

Hmm, I hope things work now, but it’s not nice to have an unexplained issue.