Error with gmx mdrun -deffnm md_0_10 -v -nb gpu

GROMACS version:
GROMACS modification: Yes/No
Here post your question
what the problem with this simulation


gmx mdrun -deffnm md_0_10 -v -nb gpu
:-) GROMACS - gmx mdrun, 2022.3 (-:

Executable: /usr/local/gromacs/bin/gmx
Data prefix: /usr/local/gromacs
Working dir: /home/bioinfo/gromacs-2022.3/build/protein
Command line:
gmx mdrun -deffnm md_0_10 -v -nb gpu

Back Off! I just backed up md_0_10.log to ./#md_0_10.log.4#
Reading file md_0_10.tpr, VERSION 2020.1-Ubuntu-2020.1-1 (single precision)
Note: file tpx version 119, software tpx version 127
Changing nstlist from 20 to 100, rlist from 1.223 to 1.344

1 GPU selected for this run.
Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node:
PP:0,PME:0
PP tasks will do (non-perturbed) short-ranged interactions on the GPU
PP task will update and constrain coordinates on the CPU
PME tasks will do all aspects on the GPU
Using 1 MPI thread
Using 6 OpenMP threads

Back Off! I just backed up md_0_10.xtc to ./#md_0_10.xtc.1#

Back Off! I just backed up md_0_10.edr to ./#md_0_10.edr.1#

WARNING: This run will generate roughly 8520 Mb of data

starting mdrun ‘Protein in water’
50000000 steps, 100000.0 ps.
step 600: timed with pme grid 96 96 96, coulomb cutoff 1.200: 4489.2 M-cycles
step 800: timed with pme grid 80 80 80, coulomb cutoff 1.268: 4482.3 M-cycles
step 1000: timed with pme grid 72 72 72, coulomb cutoff 1.409: 4867.3 M-cycles
step 1200: timed with pme grid 64 64 64, coulomb cutoff 1.585: 4928.9 M-cycles
step 1200: the maximum allowed grid scaling limits the PME load balancing to a coulomb cut-off of 1.691
step 1400: timed with pme grid 60 60 60, coulomb cutoff 1.691: 5332.0 M-cycles
step 1600: timed with pme grid 64 64 64, coulomb cutoff 1.585: 5125.6 M-cycles
step 1800: timed with pme grid 72 72 72, coulomb cutoff 1.409: 4688.1 M-cycles
step 2000: timed with pme grid 80 80 80, coulomb cutoff 1.268: 4570.8 M-cycles
step 2200: timed with pme grid 84 84 84, coulomb cutoff 1.208: 4497.2 M-cycles
step 2400: timed with pme grid 96 96 96, coulomb cutoff 1.200: 4458.9 M-cycles
step 2600: timed with pme grid 64 64 64, coulomb cutoff 1.585: 4898.4 M-cycles
step 2800: timed with pme grid 72 72 72, coulomb cutoff 1.409: 4908.0 M-cycles
step 3000: timed with pme grid 80 80 80, coulomb cutoff 1.268: 4886.0 M-cycles
step 3200: timed with pme grid 84 84 84, coulomb cutoff 1.208: 4523.2 M-cycles
step 3400: timed with pme grid 96 96 96, coulomb cutoff 1.200: 4674.6 M-cycles
optimal pme grid 96 96 96, coulomb cutoff 1.200
step 3574800, will finish Wed Sep 21 21:56:17 2022
WARNING: GPU kernel (PME gather) failed to launch. An unhandled error from a previous CUDA operation was detected. CUDA error #999 (cudaErrorUnknown): unknown error.


Program: gmx mdrun, version 2022.3
Source file: src/gromacs/gpu_utils/devicebuffer.cuh (line 197)
Function: copyFromDeviceBuffer(ValueType*, ValueType**, size_t, size_t, const DeviceStream&, GpuApiCallBehavior, CommandEvent*) [with ValueType = gmx::BasicVector; DeviceBuffer = gmx::BasicVector*; size_t = long unsigned int; CommandEvent = void]::<lambda()>

Assertion failed:
Condition: stat == cudaSuccess
Asynchronous D2H copy failed. CUDA error #999 (cudaErrorUnknown): unknown
error.

For more information and tips for troubleshooting, please check the GROMACS
website at Common Errors — GROMACS webpage https://www.gromacs.org documentation

Perhaps your GPU is running out of memory during the PME load balancing? What hardware are you running on?

To try to diagnose the concrete error try the following:
/PATH/TO/CUDA/compute-sanitizer gmx mdrun -deffnm md_0_10 -v -nb gpu -nsteps 10000
What does this report?

You can try passing “-notunepme”, that will disable the load balancing (and potentially lead to a small performance loss), but it might confirm whether the issue is related to PME tuning.



can you tell me what the problem? my working directory and files not in compute-sanitizer

compute-sanitizer is in the CUDA installation, so you need to replace the /PATH/TO/CUDA/ with the path to your CUDA toolkit, e.g. it could be /usr/local/cuda/bin/.

I believe @pszilard meant to use the compute-sanitizer program inside the folder of the same name:

$ ls -l /usr/local/cuda/compute-sanitizer/compute-sanitizer 
-rwxr-xr-x. 1 root root 7.2M May 19 23:24 /usr/local/cuda/compute-sanitizer/compute-sanitizer

(Edit: the one in the bin folder works as well, it’s just a script that launches this program).

You should use this program as a launcher for GROMACS, just as you’d sometimes use mpirun for a MPI-based GROMACS build.

Giacomo

That means, do I put my working directory and files into bin. After that, I run a simulation. Is it right?

No, I mean that you should use the same command you used to run GROMACS beforehand, but put the compute sanitizer in front of the gmx mdrun command.