I’m trying to compile gromacs 2020.4 with GPU-support on a cluster with GPU- and CPU-only nodes. However, when trying to compile on the GPU-nodes, I fail due to missing libraries that I don’t have permission to install.
So I tried on the non-GPU nodes where the libraries are available but cmake will not detect the presence of GPUs during compilation and hence my compiled program (even though it passes all checks) has no GPU-support as can be seen from the md.log from a testrun. Furthermore, this test run does not proceed without GPU-support but is rather stuck at the first step (md.log remains at step 0).
GROMACS version: 2020.4
Verified release checksum is 79c2857291b034542c26e90512b92fd4b184a1c9d6fa59c55f2e24ccf14e7281
Precision: single
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support: CUDA
SIMD instructions: AVX2_256
FFT library: fftw-3.3.8-sse2-avx-avx2-avx2_128
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: hwloc-1.11.0
Tracing support: disabled
C compiler: /software/gcc/7.2.0/bin/gcc GNU 7.2.0
C compiler flags: -mavx2 -mfma -fexcess-precision=fast -funroll-all-loops -O3 -DNDEBUG
C++ compiler: /software/gcc/7.2.0/bin/g++ GNU 7.2.0
C++ compiler flags: -mavx2 -mfma -fexcess-precision=fast -funroll-all-loops -fopenmp -O3 -DNDEBUG
CUDA compiler: /software/cuda/10.0/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2018 NVIDIA Corporation;Built on Sat_Aug_25_21:08:01_CDT_2018;Cuda compilation tools, release 10.0, V10.0.130
CUDA compiler flags:-gencode;arch=compute_52,code=sm_52;-use_fast_math;;-mavx2 -mfma -fexcess-precision=fast -funroll-all-loops -fopenmp -O3 -DNDEBUG
CUDA driver: 10.10
CUDA runtime: N/A
My question now is if there is a way to compile gromacs with support for the GPUs without the cards actually being present on the node I am compiling on? If so, does anyone spot what I am doing wrong with my settings?
Compiling a GPU-enabled build does not require GPU hardware, so I suspect there are some software configuration issues here.
If library dependencies are not found at build-time, unless these are optional and you make sure to no link against those, you won’t be able to run binaries built elsewhere that link against these libraries.
Your GROMACS version header shows GPU support: CUDA which means that it is build with CUDA support. Further below, the CUDA runtime: N/A indicates that there is no (compatible) CUDA runtime available and that is why you do not get GPUs detected. Furthermore the “(GPU detection deactivated)” message suggests that some error condition e.g. incompatible CUDA runtime disabled the GPU detection. Have you made sure that the CUDA runtime is available and functional on the host where you are trying to run?
first of all thanks for making clear that I don’t require compilation on
the compute-nodes! So now I can got on looking for the error elsewhere.
I learned that on our system not the complete libraries are missing on
the compute nodes but only the headers to save space. So linking against
these libraries should not be a problem, only the compilation must be
done on the login-nodes where the headers are available.
Regarding the functionality of the cuda runtime on the nodes, I tried
the following: I compiled this code
(https://github.com/chathhorn/cuda-semantics/blob/master/examples/getVersion.cu)
using nvcc 10.0 on the login-node and ran it under both the compute- and
the login nodes each with the respective cuda-modules loaded.
On the compute node this confirms what gromacs indicated: the runtime
version is not found
On the login-node I get neither a driver nor a runtime as there is no
GPU and hence no driver installed
/cudaGetDriverVersion
Driver Version: 0
Runtime Version: 0
Is this the usual way to check the presence of cudart? If so, does that
indicate that something is wrong with the installed cuda-modules?
It is more likely that an incompatible runtime is used. What does not make sense is that already your initial report shows CUDA driver: 10.10 and a runtime 10.0 suggested by the use of Cuda compilation tools, release 10.0, V10.0.130. These should be compatible, not sure why they are not. The only thing I can think of is, try to use CUDA 10.1 (or an earlier version like 9.2).
Side-note: the GROMACS build system will by default link statically against the CUDA runtime (see the value of the CUDA_cudart_static_LIBRARY cmake cache variable in CMakeCache.txt) and therefore a missing library can not be the issue – and in any case that would prevent even launching gmx.