Make check failing when GPU enabled

GROMACS version: 2021.1
GROMACS modification: No
Here post your question

Hi folks,
I’ve been trying to get a CUDA-enabled gmx to pass make check but it’s timing out on a lot of tests. They’re (mostly) passing when run CPU-only, however.

cmake -DCMAKE_INSTALL_PREFIX=${PWD}/…/gromacs_final -DREGRESSIONTEST_DOWNLOAD=ON -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ …/ -DGMX_GPU=CUDA -DCUDA_TOOLKIT_ROOT_DIR=/exports/applications/apps/SL7/cuda/10.1.105 -DGMX_CUDA_TARGET_SM=“35;37;60;61;62”

(the targets are explicitly set as the binary could run on either K80 or Titan-X GPUs)

gmx mdrun -version

GROMACS version: 2021.1
Verified release checksum is 8c24bff5d3f78b0a9afb16e880b5667e5affe9a686d462482bac20ce975492c6
Precision: mixed
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support: CUDA
SIMD instructions: AVX2_256
FFT library: fftw-3.3.3-sse2
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
C compiler: /exports/applications/apps/community/roslin/gcc/7.3.0/bin/gcc GNU 7.3.0
C compiler flags: -mavx2 -mfma -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -pthread -O3 -DNDEBUG
C++ compiler: /exports/applications/apps/community/roslin/gcc/7.3.0/bin/g++ GNU 7.3.0
C++ compiler flags: -mavx2 -mfma -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -pthread -fopenmp -O3 -DNDEBUG
CUDA compiler: /exports/applications/apps/SL7/cuda/10.1.105/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2019 NVIDIA Corporation;Built on Fri_Feb__8_19:08:17_PST_2019;Cuda compilation tools, release 10.1, V10.1.105
CUDA compiler flags:-std=c++14;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_62,code=sm_62;-use_fast_math;;-mavx2 -mfma -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -pthread -fopenmp -O3 -DNDEBUG

When running:
On GPU:

The following tests FAILED:
	 12 - MdlibUnitTest (Timeout)
	 19 - DomDecMpiTests (Timeout)
	 20 - EwaldUnitTests (Timeout)
	 22 - GpuUtilsUnitTests (Timeout)
	 23 - HardwareUnitTests (Timeout)
	 53 - MdrunOutputTests (Timeout)
	 54 - MdrunModulesTests (Timeout)
	 55 - MdrunIOTests (Timeout)
	 56 - MdrunTests (Timeout)
	 57 - MdrunPmeTests (Timeout)
	 58 - MdrunNonIntegratorTests (Timeout)
	 59 - MdrunTpiTests (Timeout)
	 60 - MdrunMpiTests (Timeout)
	 61 - MdrunMpiPmeTests (Timeout)
	 62 - MdrunMpiCoordinationTestsOneRank (Timeout)
	 63 - MdrunMpiCoordinationTestsTwoRanks (Timeout)
	 64 - MdrunFEPTests (Timeout)
	 66 - GmxapiExternalInterfaceTests (Timeout)
	 67 - GmxapiInternalInterfaceTests (Timeout)
	 68 - regressiontests/complex (Timeout)
	 69 - regressiontests/freeenergy (Timeout)
	 70 - regressiontests/rotation (Timeout)
	 71 - regressiontests/essentialdynamics (Timeout)

with GMX_DISABLE_GPU_DETECTION=1 make check

The following tests FAILED:
63 - MdrunMpiCoordinationTestsTwoRanks (Timeout)

I am running this on a shared (gridengine) HPC cluster so there will be other jobs running on the node at the same time, but I should be getting exclusive use of a core and a GPU. It almost looks as if the tests aren’t being passed onto the GPU.

Any ideas?

Thanks,
MIke

Just for clarity, I’ve made sure that CUDA_VISIBLE_DEVICES is set but as it’s a shared machine with up to 8 GPUs it won’t necessarily be 0. In the example above it was 1.

That the tests are timing out suggests that either the CPU or GPU you are using to execute the test is busy. Make sure that you are not using resourced that are already busy. Note that unless restricted, the test may try to use all cores / GPUs it detects, but if you set up the job correctly restricting the resources allocated and assuming your schedules is set up correctly, this should not happen.