Fatal error: Unexpected cudaStreamQuery failure: an illegal memory access was encountered

GROMACS version: 2019.4
GROMACS modification: Yes/No
Here post your question

I came up with this fatal error when appending one simulation with -cpi -append. After running ~ 200 ps, the simulation stopped, and the error was repeatable when I tried to restart again. Could you please help me to see how to find the problem?

Thanks!

Following is the information from log …:

Restarting from checkpoint, appending to previous log file.

                  :-) GROMACS - gmx mdrun, 2019.4 (-:

Executable: /install/gromacs-2019.4/bin/gmx
Data prefix: /install/gromacs-2019.4
Working dir: /home/work/gpu/hIAPP_POPG_enlarge_dimer_highc/conf5_ff/duplicatey/npt_duplicate1
Process ID: 20605
Command line:
gmx mdrun -s npt_1500-2000ns.tpr -deffnm npt -v -cpi npt.cpt -append -maxh 12

GROMACS version: 2019.4
Precision: single
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support: CUDA
SIMD instructions: AVX2_256
FFT library: fftw-3.3.8-sse2-avx-avx2-avx2_128
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
C compiler: /usr/bin/cc GNU 4.8.5
C compiler flags: -mavx2 -mfma -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast
C++ compiler: /usr/bin/c++ GNU 4.8.5
C++ compiler flags: -mavx2 -mfma -std=c++11 -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast
CUDA compiler: /usr/local/cuda-10.0/bin/nvcc nvcc: NVIDIA ® Cuda compiler driver;Copyright © 2005-2018 NVIDIA Corporation;Built on Sat_Aug_25_21:08:01_CDT_2018;Cuda compilation tools, release 10.0, V10.0.130
CUDA compiler flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;;; ;-mavx2;-mfma;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
CUDA driver: 10.0
CUDA runtime: 10.0

Changing nstlist from 10 to 100, rlist from 1.6 to 1.698

Using 1 MPI thread
Using 12 OpenMP threads

1 GPU selected for this run.
Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node:
PP:0,PME:0
PP tasks will do (non-perturbed) short-ranged interactions on the GPU
PME tasks will do all aspects on the GPU
Pinning threads with an auto-selected logical core stride of 1
System total charge: 0.000
Will do PME sum in reciprocal space for electrostatic interactions.

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen
A smooth particle mesh Ewald method
J. Chem. Phys. 103 (1995) pp. 8577-8592
-------- -------- — Thank You — -------- --------

Using a Gaussian width (1/beta) of 0.51226 nm for Ewald
Potential shift: LJ r^-12: 0.000e+00 r^-6: 0.000e+00, Ewald -6.250e-06
Initialized non-bonded Ewald correction tables, spacing: 1.18e-03 size: 1357

Long Range LJ corr.: 3.5365e-04
Generated table with 1349 data points for Ewald.
Tabscale = 500 points/nm
Generated table with 1349 data points for LJ6Switch.
Tabscale = 500 points/nm
Generated table with 1349 data points for LJ12Switch.
Tabscale = 500 points/nm
Generated table with 1349 data points for 1-4 COUL.
Tabscale = 500 points/nm
Generated table with 1349 data points for 1-4 LJ6.
Tabscale = 500 points/nm
Generated table with 1349 data points for 1-4 LJ12.
Tabscale = 500 points/nm

Using GPU 8x8 nonbonded short-range kernels

Using a dual 8x4 pair-list setup updated with dynamic, rolling pruning:
outer list: updated every 100 steps, buffer 0.098 nm, rlist 1.698 nm
inner list: updated every 18 steps, buffer 0.002 nm, rlist 1.602 nm
At tolerance 0.005 kJ/mol/ps per atom, equivalent classical 1x1 list would be:
outer list: updated every 100 steps, buffer 0.259 nm, rlist 1.859 nm
inner list: updated every 18 steps, buffer 0.077 nm, rlist 1.677 nm

Initializing LINear Constraint Solver
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and H. Bekker and H. J. C. Berendsen and J. G. E. M. Fraaije
LINCS: A Linear Constraint Solver for molecular simulations
J. Comp. Chem. 18 (1997) pp. 1463-1472
-------- -------- — Thank You — -------- --------

The number of constraints is 16202

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Miyamoto and P. A. Kollman
SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
Water Models
J. Comp. Chem. 13 (1992) pp. 952-962
-------- -------- — Thank You — -------- --------

Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
0: System
There are: 59316 Atoms

Started mdrun on rank 0 Wed Aug 5 11:25:36 2020

step 989737700: timed with pme grid 32 60 96, coulomb cutoff 1.600: 1252.9 M-cycles
step 989737900: timed with pme grid 28 52 84, coulomb cutoff 1.763: 1434.2 M-cycles
step 989738100: timed with pme grid 28 52 96, coulomb cutoff 1.720: 1371.4 M-cycles
step 989738300: timed with pme grid 28 56 96, coulomb cutoff 1.600: 1204.0 M-cycles
step 989738500: timed with pme grid 32 60 96, coulomb cutoff 1.600: 1202.8 M-cycles
step 989738700: timed with pme grid 28 56 96, coulomb cutoff 1.600: 1198.1 M-cycles

Program: gmx mdrun, version 2019.4
Source file: src/gromacs/gpu_utils/cudautils.cuh (line 251)

Fatal error:
Unexpected cudaStreamQuery failure: an illegal memory access was encountered

For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors