GROMACS version:2023.3
GROMACS modification: No
I 'm getting a CUDA error using the standard Gromacs 2023.3 version on my laptop using:
NVIDIA RTX-4060 gpu and 20 OpenMP threads (corei7 generation 13).
Cuda version: 12.3 nvidia driver version: 545.84 . Os: Ubuntu 22.4.02 (installed on WSL2 , WINDOWS 11)
The error message is:
Blockquote
Program: gmx mdrun, version 2023.3
Source file: src/gromacs/gpu_utils/device_stream.cu (line 100)
Function: DeviceStream::synchronize() const::<lambda()>
Assertion failed:
Condition: stat == cudaSuccess
cudaStreamSynchronize failed. CUDA error #700 (cudaErrorIllegalAddress): an
illegal memory access was encountered.
I am attaching .tpr file of my simulation to help developer team reproduce and find the root cause of the error.
tpr file link : https://filetransfer.io/data-package/IjCaW1po#link
PS. to give some perspective, I am trying to run an AWH fairly long simulation. i am getting this error every few hours (equal to a ca. 1 ns of simulation) and have to restart the simulation job using -cpi option but the same scenario keeps happening over and over again , hence making the whole process somewhat painstaking and bumpy.
@pszilard : Could you kindly look into this issue and offer any insight? as i can see the same issue has been raised on gromacs gitlab page under issue no.#4841 which on that case was resolved by further equilibration before md production run. In my case though it seems there is a more systematic problem as the same error comes up once every few hours (ca. 1 ns of simulation) despite the fact that during the course of AWH simulation, system moves to different states as part of the algorithm to span the reaction coordinate interval defined in the mdp file and hence it seems unlikely that the system’s initial condition to be the root cause of the error.
my job submission command line script is:
gmx mdrun -deffnm awh -px pullx.xvg -nb gpu -pme gpu -update gpu
kind regards,
roozi