Cannot figure out why gmx_mpi cannot detect nvidia GPU

GROMACS version: 2024.4
GROMACS modification: No

I had been compiling the gromacs for a while and use CUDA 12.5 and the card can be detected using nvidia-smi

I have also checked the gmx --version to see if the CUDA driver and runtime is present in the gmx
GROMACS version: 2024.4
Precision: mixed
Memory model: 64 bit
MPI library: MPI
MPI library version: Intel(R) MPI Library 2021.9 for Linux* OS
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 128)
GPU support: CUDA
NBNxM GPU setup: super-cluster 2x2x2 / cluster 8
SIMD instructions: AVX2_256
CPU FFT library: fftw-3.3.8-sse2-avx-avx2-avx2_128
GPU FFT library: cuFFT
Multi-GPU FFT: none
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
C compiler: /bin/cc GNU 11.4.1
C compiler flags: -fexcess-precision=fast -funroll-all-loops -mavx2 -mfma -Wno-missing-field-initializers -O3 -DNDEBUG
C++ compiler: /bin/c++ GNU 11.4.1
C++ compiler flags: -fexcess-precision=fast -funroll-all-loops -mavx2 -mfma -Wno-missing-field-initializers -Wno-cast-function-type-strict -fopenmp -O3 -DNDEBUG
BLAS library: External - detected on the system
LAPACK library: External - detected on the system
CUDA compiler: /usr/local/cuda-12.5/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2024 NVIDIA Corporation;Built on Thu_Jun__6_02:18:23_PDT_2024;Cuda compilation tools, release 12.5, V12.5.82;Build cuda_12.5.r12.5/compiler.34385749_0
CUDA compiler flags:-std=c++17;–generate-code=arch=compute_89,code=sm_89;-use_fast_math;-Xptxas;-warn-double-usage;-Xptxas;-Werror;-D_FORCE_INLINES;-Xcompiler;-fopenmp;-fexcess-precision=fast -funroll-all-loops -mavx2 -mfma -Wno-missing-field-initializers -Wno-cast-function-type-strict -fopenmp -O3 -DNDEBUG
CUDA driver: 12.50
CUDA runtime: 12.50

strangely the GMX cannot detect the GPU as it says in the log file
Running on 1 node with total 24 cores, 32 processing units (GPU detection failed)
Hardware detected on host gpu5 (the node of MPI rank 0):
CPU info:
Vendor: Intel
Brand: 13th Gen Intel(R) Core™ i9-13900K
Family: 6 Model: 183 Stepping: 1
Features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sha sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
Hardware topology: Basic
Packages, cores, and logical processors:
[indices refer to OS logical processors]
Package 0: [ 0 1] [ 2 3] [ 4 5] [ 6 7] [ 8 9] [ 10 11] [ 12 13] [ 14 15] [ 16] [ 17] [ 18] [ 19] [ 20] [ 21] [ 22] [ 23] [ 24] [ 25] [ 26] [ 27] [ 28] [ 29] [ 30] [ 31]
CPU limit set by OS: -1 Recommended max number of threads: 32

the compilation is using intel oneAPI with with-gpu option , with-mpi option. I guess I need some help here.

Is this via a queuing system/workload management system, such as slurm? Have you requested any GPUs (the slurm -G option)?

there is SLURM and we have also applied -gres option into that , the main thing is the RTX4090 cannot be found in the system.

but other software itself like LAMMPS could accelerate the calculation.

for debugging purpose , the mpirun has also been run and the gmx_mpi still cannot find the GPU, -nb cpu is fine for calculation.

I’m afraid I don’t know what might be the problem then. @pszilard or @al42and, do you have any ideas?

Could you please share the md.log file from a problematic run (feel free to redact the usernames etc if you want)?

log.txt (14.7 KB)
How about the log here?

Ok, thanks for sharing. Nothing wrong seen in the log file.

As Magnus said, such behavior (device visible with nvidia-smi, driver detected by GROMACS, but still getting “GPU detection failed” message) usually means that CUDA_VISIBLE_DEVICES is not set correctly (either by the batch system or by the MPI library, although, as far as I recall, IntelMPI does not limit the visibility of NVIDIA devices).

Could you try adding the lines below to your batch script right before gmx_mpi mdrun is called, and share the output?

echo "CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES-unset}"
env | grep '^I_MPI_OFFLOAD'
export I_MPI_OFFLOAD_DEVICES=all
export I_MPI_DEBUG=3

ss.txt (17.1 KB)

Please find the log after applying CUDA visible devices as ALL , IMPI offload as all , turned on verbose 3 of IMPI debug of the calculation (mdrun -nb gpu)

Thanks. Quite strange.

Could you try compiling this simple device detection utility in a similar SLURM session on the GPU node and see what it prints?

$ wget https://raw.githubusercontent.com/al42and/cuda-smi/refs/heads/master/cuda-smi.cpp
$ /usr/local/cuda-12.5/bin/nvcc cuda-smi.cpp -lcudart_static -lnvidia-ml -o cuda-smi
$ ./cuda-smi
$ mpirun --perhost 1 ./cuda-smi

OK , after looking the cuda-smi , we found out the CUDAquery function isn’t functioning and we further dig in the process using modinfo , re-install the RPM of CUDA.

but it turns out the chroot environment of the deployment node images can’t build the nvidia_uvm etc modules and we discovered that problem (and remedy) in a month ago , undocumented , seems not discussed in nvidia forum also

Now it is functional :)

1 Like