GROMACS version: 2024.4
GROMACS modification: No
I have access to 2 nvidia gpus (5070 Ti and 5080) with Driver Version: 570.124.04 CUDA Version: 12.8.
I’m trying to build GROMACS 2024.4 with CUDA GPU support in linux debian using the following commands:
tar xfz gromacs-2024.4.tar.gz
cd gromacs-2024.4
mkdir build
cd build
cmake .. -DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=ON
cmake .. -DGMX_GPU=CUDA -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda
make -j 20
make check
sudo make install
source /usr/local/gromacs/bin/GMXRC
During installation I get no errors and also i see in the output that GROMACS tries different cuda architecture numbers eg 80 etc that say success. Also I see success for CUDA architecture 90 which I think belongs to the ADA so it seems that it gets recognized.
nvcc –version works and gives me this output:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Wed_Jan_15_19:20:09_PST_2025
Cuda compilation tools, release 12.8, V12.8.61
Build cuda_12.8.r12.8/compiler.35404655_0
But when I run a simulation my gpu isn’t recognized and it uses the CPU instead. Specifically, I get this:
Command line:
gmx mdrun -v -deffnm nvt
WARNING: An error occurred while sanity checking device #0. An unhandled error from a previous CUDA operation was detected. CUDA error #209 (cudaErrorNoKernelImageForDevice): no kernel image is available for execution on the device.
Reading file nvt.tpr, VERSION 2024.4 (single precision)
Changing nstlist from 10 to 50, rlist from 1.2 to 1.279
Using 32 MPI threads
Using 1 OpenMP thread per tMPI thread
WARNING: This run will generate roughly 2303 Mb of data
starting mdrun ‘Mixed system with molecule A and B in water’
2500000 steps, 5000.0 ps.
step 700, will finish Mon Oct 27 15:45:02 2025^Cl 0.83 imb F 6% pme/F 0.47
Received the INT signal, stopping within 200 steps
step 850, will finish Mon Oct 27 15:54:33 2025vol 0.78 imb F 5% pme/F 0.47
Dynamic load balancing report:
DLB was turned on during the run due to measured imbalance.
Average load imbalance: 8.3%.
The balanceable part of the MD step is 86%, load imbalance is computed from this.
Part of the total run time spent waiting due to load imbalance: 7.1%.
Steps where the load balancing was limited by -rdd, -rcon and/or -dds: Y 0 % Z 0 %
Average PME mesh/force load: 0.474
Part of the total run time spent waiting due to PP/PME imbalance: 12.3 %
NOTE: 7.1 % of the available CPU time was lost due to load imbalance
in the domain decomposition.
You can consider manually changing the decomposition (option -dd);
e.g. by using fewer domains along the box dimension in which there is
considerable inhomogeneity in the simulated system.
NOTE: 12.3 % performance was lost because the PME ranks
had less work to do than the PP ranks.
You might want to decrease the number of PME ranks
or decrease the cut-off and the grid spacing.
Core t (s) Wall t (s) (%)
Time: 195.187 6.100 3199.7
(ns/day) (hour/ns)
Performance: 24.106 0.996
Then I also tried this: cmake .. -DGMX_GPU=CUDA -DCMAKE_CUDA_ARCHITECTURES=90
But again I get the same problem during simulation. GPU not recognized and gromacs use CPU instead. Could you help me?
Best regards