Gromacs-2020.4 no gpu detection

GROMACS version: 2020.4

Hello
I just compiled gromacs-2020.4 (from ftp://ftp.gromacs.org/pub/gromacs/gromacs-2020.4.tar.gz)

gromacs was compiled on a gpu host (Tesla) with gcc/9.2.0, cmake/3.17.2, BLAS/3.8.0, gsl/2.6, boost/1.72.0, cuda/11.1

cmake -DGMX_GPU=ON \
                     -DGMX_X11=ON \
                     -DGMX_BUILD_OWN_FFTW=ON \
                     -DCMAKE_INSTALL_RPATH=/opt/gensoft/exe/gromacs/2020.4/lib \
                     -DCMAKE_INSTALL_PREFIX=/opt/gensoft/exe/gromacs/2020.4 \
                     /opt/gensoft/src/gromacs/gromacs-2020.4

cmake log shows cuda//GPU detection see:

-- Looking for NVIDIA GPUs present in the system
-- Number of NVIDIA GPUs detected: 4 

and

-- Found CUDA: /opt/gensoft/exe/cuda/11.1 (found suitable version "11.1", minimum required is "9.0") 
-- Enabling native GPU acceleration

but when run gmx does not detect gpu, see:

maestro-3000:~ > gmx -version 
maestro-3000:~ > gmx -version -quiet
                         :-) GROMACS - gmx, 2020.4 (-:

Executable:   /pasteur/sonic/homes/edeveaud/bin/gmx
Data prefix:  /pasteur/sonic/homes/edeveaud
Working dir:  /pasteur/sonic/homes/edeveaud
Command line:
  gmx -version -quiet

GROMACS version:    2020.4
Verified release checksum is 79c2857291b034542c26e90512b92fd4b184a1c9d6fa59c55f2e24ccf14e7281
Precision:          single
Memory model:       64 bit
MPI library:        thread_mpi
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support:        CUDA
SIMD instructions:  AVX_512
FFT library:        fftw-3.3.8-sse2-avx-avx2-avx2_128-avx512
RDTSCP usage:       enabled
TNG support:        enabled
Hwloc support:      disabled
Tracing support:    disabled
C compiler:         /opt/gensoft/exe/gcc/9.2.0/bin/gcc GNU 9.2.0
C compiler flags:   -mavx512f -mfma -fexcess-precision=fast -funroll-all-loops -O3 -DNDEBUG
C++ compiler:       /opt/gensoft/exe/gcc/9.2.0/bin/g++ GNU 9.2.0
C++ compiler flags: -mavx512f -mfma -fexcess-precision=fast -funroll-all-loops -fopenmp -O3 -DNDEBUG
CUDA compiler:      /opt/gensoft/exe/cuda/11.1/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2020 NVIDIA Corporation;Built on Tue_Sep_15_19:10:02_PDT_2020;Cuda compilation tools, release 11.1, V11.1.74;Build cuda_11.1.TC455_06.29069683_0
CUDA compiler flags:-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-Wno-deprecated-gpu-targets;-gencode;arch=compute_35,code=compute_35;-gencode;arch=compute_50,code=compute_50;-gencode;arch=compute_52,code=compute_52;-gencode;arch=compute_60,code=compute_60;-gencode;arch=compute_61,code=compute_61;-gencode;arch=compute_70,code=compute_70;-gencode;arch=compute_75,code=compute_75;-gencode;arch=compute_80,code=compute_80;-use_fast_math;-D_FORCE_INLINES;-mavx512f -mfma -fexcess-precision=fast -funroll-all-loops -fopenmp -O3 -DNDEBUG
CUDA driver:        0.0
CUDA runtime:       N/A

NB as shown by ldd , gromacs and gormacs libs are correctly linked to cuda libcufft.so

I guess I missed something, but what ?

regards

Eric

The above suggests that either your GPU is not functioning properly or that you have a mismatched driver and/or runtime version.

maestro-3000:~ > nvidia-smi
Tue Dec  8 15:12:46 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05    Driver Version: 455.23.05    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:61:00.0 Off |                    0 |
| N/A   40C    P0    55W / 300W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  Off  | 00000000:62:00.0 Off |                    0 |
| N/A   38C    P0    55W / 300W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2...  Off  | 00000000:89:00.0 Off |                    0 |
| N/A   38C    P0    56W / 300W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2...  Off  | 00000000:8A:00.0 Off |                    0 |
| N/A   39C    P0    58W / 300W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

furthermore. I recompiled gromacs vs cuda/10.2 and now gmx show gpu ???

maestro-3000:~ > gmx -quiet -version
                    :-) GROMACS - gmx, 2020.4-UNCHECKED (-:

Executable:   /pasteur/sonic/homes/edeveaud/gmx-cuda-10.2/bin/gmx
Data prefix:  /pasteur/sonic/homes/edeveaud/gmx-cuda-10.2
Working dir:  /pasteur/sonic/homes/edeveaud
Command line:
  gmx -quiet -version

GROMACS version:    2020.4-UNCHECKED
The source code this program was compiled from has not been verified because the reference checksum was missing during compilation. This means you have an incomplete GROMACS distribution, please make sure to download an intact source distribution and compile that before proceeding.
Computed checksum: NoChecksumFile
Precision:          single
Memory model:       64 bit
MPI library:        thread_mpi
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support:        CUDA
SIMD instructions:  AVX_512
FFT library:        fftw-3.3.8-sse2-avx-avx2-avx2_128-avx512
RDTSCP usage:       enabled
TNG support:        enabled
Hwloc support:      disabled
Tracing support:    disabled
C compiler:         /opt/gensoft/exe/gcc/9.2.0/scripts/cc GNU 9.2.0
C compiler flags:   -mavx512f -mfma -fexcess-precision=fast -funroll-all-loops -O3 -DNDEBUG
C++ compiler:       /opt/gensoft/exe/gcc/9.2.0/scripts/c++ GNU 9.2.0
C++ compiler flags: -mavx512f -mfma -fexcess-precision=fast -funroll-all-loops -fopenmp -O3 -DNDEBUG
CUDA compiler:      /opt/gensoft/exe/cuda/10.2/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2019 NVIDIA Corporation;Built on Wed_Oct_23_19:24:38_PDT_2019;Cuda compilation tools, release 10.2, V10.2.89
CUDA compiler flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_35,code=compute_35;-gencode;arch=compute_50,code=compute_50;-gencode;arch=compute_52,code=compute_52;-gencode;arch=compute_60,code=compute_60;-gencode;arch=compute_61,code=compute_61;-gencode;arch=compute_70,code=compute_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;-D_FORCE_INLINES;-mavx512f -mfma -fexcess-precision=fast -funroll-all-loops -fopenmp -O3 -DNDEBUG
CUDA driver:        11.10
CUDA runtime:       10.20

The CUDA 10.2 implementation is consistent with the remaining components of your environment. Trying to install against 11.1 led to a mismatch with the driver and runtime API.

I believe you, but I would like to understand why ??

nvidias-smi tells me that CUDA Version: 11.1

so why using cuda/11.1 to build gromacs led to a mismatch ??

I don’t know the details of your system but it appears you’re loading various modules to satisfy dependencies. If the wrong things are loaded, you’ll get mismatches. I can’t really guess beyond that.

so you are as I am.
we do not understand ;-)

it’s not about guessing . it’s about a problem in gromacs vs cuda/11.1

@jalemkul is correct, GROMACS simply queries the CUDA runtime and driver versions; these are reported as 0.0 or N/A when there are issues in a CUDA setup.

If you want further confirmation please try yourself with your own code calling the cudaDriverGetVersion() and cudaRuntimeGetVersion() functions (or use the test code posted recently here: Gromacs 2020.4 compilation with GPU-support on non-GPU nodes).

No that is unlikely. The nvidia-smi output you show earlier has no relevance to the problem, that only tests whether the driver works not whether the CUDA runtime a code is linked against is compatible with the driver API.

Please test your CUDA 11.1 runtime with other software and if that works, we can try to figure out what is wrong with your GROMACS build when using CUDA 11.1.

you were right our cuda installation was not working as expected
cudaDriverGetVersion and cudaRuntimeGetVersion did not return corect values, no errors but not the expected values.

I just updated to cuda-11.1.1_455.32.00

I will rebuild gromacs vs this new (functionning) vuda version and let you know about the results

thanks for the inputs you provided that allowed us to track the problem to our cuda install.

regards

Eric

here it is:

maestro-3000:gromacs-2020.4/build > ~/gmx/2020.4/bin/gmx -quiet -version   
                    :-) GROMACS - gmx, 2020.4-UNCHECKED (-:

Executable:   /pasteur/sonic/homes/edeveaud/gmx/2020.4/bin/gmx
Data prefix:  /pasteur/sonic/homes/edeveaud/gmx/2020.4
Working dir:  /pasteur/sonic/homes/edeveaud/gromacs-2020.4/build
Command line:
  gmx -quiet -version

GROMACS version:    2020.4-UNCHECKED
The source code this program was compiled from has not been verified because the reference checksum was missing during compilation. This means you have an incomplete GROMACS distribution, please make sure to download an intact source distribution and compile that before proceeding.
Computed checksum: NoChecksumFile
Precision:          single
Memory model:       64 bit
MPI library:        thread_mpi
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support:        CUDA
SIMD instructions:  AVX_512
FFT library:        fftw-3.3.8-sse2-avx-avx2-avx2_128-avx512
RDTSCP usage:       enabled
TNG support:        enabled
Hwloc support:      disabled
Tracing support:    disabled
C compiler:         /opt/gensoft/exe/gcc/9.2.0/bin/gcc GNU 9.2.0
C compiler flags:   -mavx512f -mfma -fexcess-precision=fast -funroll-all-loops -O3 -DNDEBUG
C++ compiler:       /opt/gensoft/exe/gcc/9.2.0/bin/g++ GNU 9.2.0
C++ compiler flags: -mavx512f -mfma -fexcess-precision=fast -funroll-all-loops -fopenmp -O3 -DNDEBUG
CUDA compiler:      /opt/gensoft/exe/cuda/11.1/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2020 NVIDIA Corporation;Built on Mon_Oct_12_20:09:46_PDT_2020;Cuda compilation tools, release 11.1, V11.1.105;Build cuda_11.1.TC455_06.29190527_0
CUDA compiler flags:-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-Wno-deprecated-gpu-targets;-gencode;arch=compute_35,code=compute_35;-gencode;arch=compute_50,code=compute_50;-gencode;arch=compute_52,code=compute_52;-gencode;arch=compute_60,code=compute_60;-gencode;arch=compute_61,code=compute_61;-gencode;arch=compute_70,code=compute_70;-gencode;arch=compute_75,code=compute_75;-gencode;arch=compute_80,code=compute_80;-use_fast_math;-D_FORCE_INLINES;-mavx512f -mfma -fexcess-precision=fast -funroll-all-loops -fopenmp -O3 -DNDEBUG
CUDA driver:        11.10
CUDA runtime:       11.10

again thanks for your help

regards

Erci