CUDA Assertion failed

GROMACS version:2023.2
GROMACS modification: No
I have been running this long simulation and recently ran into this problem, it seems like when I was running my simulation in the beginning it started off all good but then when I got to the 40-50ns range it seemed like every few hours I get this

-------------------------------------------------------
Program:     gmx mdrun, version 2023
Source file: src/gromacs/gpu_utils/device_stream.cu (line 100)
Function:    DeviceStream::synchronize() const::<lambda()>

Assertion failed:
Condition: stat == cudaSuccess
cudaStreamSynchronize failed. CUDA error #999 (cudaErrorUnknown): unknown
error.

For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------

But I don’t know why I’m getting it. Before I could leave my computer on overnight and the simulation would run just fine. But now i seems like every few hours I get this error. And the only way to fix this is to either close my laptop for some time or just completely restart my laptop. Any thoughts?

Did you find a solution for this?

I’m encountering the same issue, everything’s fine but suddenly the error pops up halfway through the run

I did and I didn’t. What I mean is I never quite figured out exactly why this was happening. All I knew is one day it just stopped. And I don’t really know how I fixed it because it was a while ago. Plus, since then I’ve upgraded from my laptop to a PC and I’ve never had this problem. But if I had to guess, I did switch from gromacs 2023 to gromacs 2023.2 and updated my cuda to 12.2 and for the time when I was still running simulations on my laptop this seemed to solve the problem. So my guess is this error is caused by some type of compatibility issue between different versions of Gromacs and Cuda? But don’t quote me on that it’s just an educated guest.

P.S if you’re wondering, I have gromacs 2023.3 and cuda 12.3 on my PC.

Thanks for the quick reply and additional info!

I’m getting the error when running on a restart. I did see somewhere that someone suggested running with CPU only to see the problem. I tried that and it now gives me

Program:     gmx mdrun, version 2023.3-plumed_2.10.0_dev
MPI rank:    11 (out of 15)

Unknown exception:
(exception type: N4PLMD6Plumed14ExceptionErrorE)

(tools/IFile.cpp:221) PLMD::IFile& PLMD::IFile::scanField()
+++ assertion failed: fields[i].read
field cv0 was not read: all the fields need to be read otherwise you could
miss important infos

Not too sure what’s going on here, I don’t know if something wrong with my input parameters causing it to not be reading the cv0 restart properly. I’m still quite new to PLUMED.

Anyways, thanks for your help and I’ll try and see what can be done :,)