GROMACS version: 2021.1
GROMACS modification: No
Here post your question
Hi folks,
I’ve been trying to get a CUDA-enabled gmx to pass make check
but it’s timing out on a lot of tests. They’re (mostly) passing when run CPU-only, however.
cmake -DCMAKE_INSTALL_PREFIX=${PWD}/…/gromacs_final -DREGRESSIONTEST_DOWNLOAD=ON -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ …/ -DGMX_GPU=CUDA -DCUDA_TOOLKIT_ROOT_DIR=/exports/applications/apps/SL7/cuda/10.1.105 -DGMX_CUDA_TARGET_SM=“35;37;60;61;62”
(the targets are explicitly set as the binary could run on either K80 or Titan-X GPUs)
gmx mdrun -version
GROMACS version: 2021.1
Verified release checksum is 8c24bff5d3f78b0a9afb16e880b5667e5affe9a686d462482bac20ce975492c6
Precision: mixed
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support: CUDA
SIMD instructions: AVX2_256
FFT library: fftw-3.3.3-sse2
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
C compiler: /exports/applications/apps/community/roslin/gcc/7.3.0/bin/gcc GNU 7.3.0
C compiler flags: -mavx2 -mfma -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -pthread -O3 -DNDEBUG
C++ compiler: /exports/applications/apps/community/roslin/gcc/7.3.0/bin/g++ GNU 7.3.0
C++ compiler flags: -mavx2 -mfma -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -pthread -fopenmp -O3 -DNDEBUG
CUDA compiler: /exports/applications/apps/SL7/cuda/10.1.105/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2019 NVIDIA Corporation;Built on Fri_Feb__8_19:08:17_PST_2019;Cuda compilation tools, release 10.1, V10.1.105
CUDA compiler flags:-std=c++14;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_62,code=sm_62;-use_fast_math;;-mavx2 -mfma -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -pthread -fopenmp -O3 -DNDEBUG
When running:
On GPU:
The following tests FAILED: 12 - MdlibUnitTest (Timeout) 19 - DomDecMpiTests (Timeout) 20 - EwaldUnitTests (Timeout) 22 - GpuUtilsUnitTests (Timeout) 23 - HardwareUnitTests (Timeout) 53 - MdrunOutputTests (Timeout) 54 - MdrunModulesTests (Timeout) 55 - MdrunIOTests (Timeout) 56 - MdrunTests (Timeout) 57 - MdrunPmeTests (Timeout) 58 - MdrunNonIntegratorTests (Timeout) 59 - MdrunTpiTests (Timeout) 60 - MdrunMpiTests (Timeout) 61 - MdrunMpiPmeTests (Timeout) 62 - MdrunMpiCoordinationTestsOneRank (Timeout) 63 - MdrunMpiCoordinationTestsTwoRanks (Timeout) 64 - MdrunFEPTests (Timeout) 66 - GmxapiExternalInterfaceTests (Timeout) 67 - GmxapiInternalInterfaceTests (Timeout) 68 - regressiontests/complex (Timeout) 69 - regressiontests/freeenergy (Timeout) 70 - regressiontests/rotation (Timeout) 71 - regressiontests/essentialdynamics (Timeout)
with GMX_DISABLE_GPU_DETECTION=1 make check
The following tests FAILED:
63 - MdrunMpiCoordinationTestsTwoRanks (Timeout)
I am running this on a shared (gridengine) HPC cluster so there will be other jobs running on the node at the same time, but I should be getting exclusive use of a core and a GPU. It almost looks as if the tests aren’t being passed onto the GPU.
Any ideas?
Thanks,
MIke