Optimizing GROMACS 2023.1 run for better GPU utilization

isoshi · December 8, 2024, 3:12am

GROMACS version:2023.1
GROMACS modification: No

I’m seeking assistance to optimize my GROMACS simulation run to fully utilize my hardware resources. Below are the details of my current setup and the issue I’m encountering.

Hardware Specifications:

CPU: AMD Ryzen 9 5950X (16 cores, 32 threads)
GPU: NVIDIA RTX 4080 SUPER, 16 GB VRAM
RAM: 64 GB DDR4
Operating System: WLS Ubuntu 22.04

GROMACS Configuration:

GROMACS Version: 2023.1
Compilation Settings:

GROMACS version:    2023.1
Precision:          mixed
Memory model:       64 bit
MPI library:        thread_mpi
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 128)
GPU support:        CUDA
NB cluster size:    8
SIMD instructions:  AVX2_256
CPU FFT library:    fftw-3.3.8-sse2-avx-avx2-avx2_128
GPU FFT library:    cuFFT
Multi-GPU FFT:      none
RDTSCP usage:       enabled
TNG support:        enabled
Hwloc support:      disabled
Tracing support:    disabled
C compiler:         /usr/bin/cc GNU 11.4.0
C compiler flags:   -fexcess-precision=fast -funroll-all-loops -mavx2 -mfma -Wno-missing-field-initializers -O3 -DNDEBUG
C++ compiler:       /usr/bin/c++ GNU 11.4.0
C++ compiler flags: -fexcess-precision=fast -funroll-all-loops -mavx2 -mfma -Wno-missing-field-initializers -Wno-cast-function-type-strict -fopenmp -O3 -DNDEBUG
CUDA compiler:      /usr/local/cuda/bin/nvcc
CUDA compiler flags: [As listed above]
CUDA driver:        12.70
CUDA runtime:       12.50

Simulation Details:

System: Protein-ligand complex
Number of Atoms: ~120,000
Simulation Type: [e.g., NVT, NPT]
Simulation Parameters: in attached md.mdp

Current Command:

gmx mdrun -s md.tpr -v -pme gpu -bonded gpu -update gpu -deffnm new_run -ntomp 14 -cpi new_run.cpt

Issue Description:

GPU Utilization: Observing only up to 50% utilization of CUDA cores on the RTX 4080 SUPER, regardless of assigning different numbers of CPU threads (e.g., 4 or 16). CPU can be load at 20-100% with no big difference in ns.
Performance Metrics: Achieving an average of 190 ns/day, which seems suboptimal given the hardware capabilities.

Questions:

Configuration Optimization: What settings or parameters should I adjust to enhance GPU utilization and overall simulation performance?
MPI and OpenMP Balance: How should I best balance the number of MPI ranks (-ntmpi) and OpenMP threads (-ntomp) to fully leverage my 16-core CPU and good GPU?
GROMACS Compilation: Are there specific compilation flags or configurations that could improve performance on my hardware setup?
System Specifics: Are there any additional system-specific optimizations (e.g., BIOS settings, OS configurations) that I should consider?
md.mdp (2.7 KB)

pszilard · December 16, 2024, 9:43pm

Please share some log files!

Topic		Replies	Views
Low Performance due to low utilisation of GPU User discussions	10	594	July 26, 2024
GPU utilization is low User discussions	4	693	July 18, 2024
Optimizing GPU performance for GROMACS? User discussions	6	1423	January 13, 2021
Optimizing CPU/GPU efficiency and performance in GROMACS simulations User discussions simulation-setup	4	347	February 11, 2025
Gromacs in colab is not using full gpu User discussions mdrun	4	457	April 16, 2024