GPU support disabled using slurm

GROMACS version: 2024.5
GROMACS modification: No

I am setting up a slurm cluster running opensuse 15.6. The test cluster has a head node and 1 compute node, which has 2 x nvidia3080 GPUs. The head node does not have a GPU.

I have compiled gromacs using CUDA 12.8. If I log in directly to the compute node, gromacs runs correctly, making use of the GPU.

If submit the job from the head node using slurm, the job runs on the compute node, but does not use the GPU. The gromacs log file says that GPU support is disabled.

Why is this happening? What causes GPU support to be disabled?

Section of log:

GROMACS: gmx mdrun, version 2024.5
Executable: /apps/linux/gromacs/gromacs-2024.2_Intel_CUDA12.5/bin/gmx
Data prefix: /apps/linux/gromacs/gromacs-2024.2_Intel_CUDA12.5
Working dir: /home/david/pacs_test
Process ID: 21008
Command line:
gmx mdrun -deffnm t_1

GROMACS version: 2024.5
Precision: mixed
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 128)
GPU support: disabled
SIMD instructions: AVX2_256
CPU FFT library: fftw-3.3.8-sse2-avx-avx2-avx2_128
GPU FFT library: none
Multi-GPU FFT: none
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
C compiler: /usr/bin/gcc-12 GNU 12.3.0
C compiler flags: -fexcess-precision=fast -funroll-all-loops -mavx2 -mfma -Wno-missing-field-initializers -O3 -DNDEBUG
C++ compiler: /usr/bin/g+±12 GNU 12.3.0
C++ compiler flags: -fexcess-precision=fast -funroll-all-loops -mavx2 -mfma -Wno-missing-field-initializers -Wno-cast-function-type-strict SHELL:-fopenmp -O3 -DNDEBUG
BLAS library: External - detected on the system
LAPACK library: External - detected on the system

Hi!

“GPU support: disabled” means the current gmx binary is compiled without GPU support.

Looks like you are launching a wrong version of GROMACS. Adding echo $PATH and module list to your Slurm scripts and comparing the output with the output from interactive session can be a good starting point.

Thanks Andrey.

I have found the culprit. I was calling the same executable on both machines - however, LD_LIBRARAY_PATH on the head node pointed to libraries from a non-GPU version while on the node it pointed to libraries from the GPU enabled verison.

Cheers

David

1 Like