GROMACS version:2023.1
GROMACS modification: No
I’m seeking assistance to optimize my GROMACS simulation run to fully utilize my hardware resources. Below are the details of my current setup and the issue I’m encountering.
Hardware Specifications:
- CPU: AMD Ryzen 9 5950X (16 cores, 32 threads)
- GPU: NVIDIA RTX 4080 SUPER, 16 GB VRAM
- RAM: 64 GB DDR4
- Operating System: WLS Ubuntu 22.04
GROMACS Configuration:
- GROMACS Version: 2023.1
- Compilation Settings:
GROMACS version: 2023.1
Precision: mixed
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 128)
GPU support: CUDA
NB cluster size: 8
SIMD instructions: AVX2_256
CPU FFT library: fftw-3.3.8-sse2-avx-avx2-avx2_128
GPU FFT library: cuFFT
Multi-GPU FFT: none
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
C compiler: /usr/bin/cc GNU 11.4.0
C compiler flags: -fexcess-precision=fast -funroll-all-loops -mavx2 -mfma -Wno-missing-field-initializers -O3 -DNDEBUG
C++ compiler: /usr/bin/c++ GNU 11.4.0
C++ compiler flags: -fexcess-precision=fast -funroll-all-loops -mavx2 -mfma -Wno-missing-field-initializers -Wno-cast-function-type-strict -fopenmp -O3 -DNDEBUG
CUDA compiler: /usr/local/cuda/bin/nvcc
CUDA compiler flags: [As listed above]
CUDA driver: 12.70
CUDA runtime: 12.50
Simulation Details:
- System: Protein-ligand complex
- Number of Atoms: ~120,000
- Simulation Type: [e.g., NVT, NPT]
- Simulation Parameters: in attached md.mdp
Current Command:
gmx mdrun -s md.tpr -v -pme gpu -bonded gpu -update gpu -deffnm new_run -ntomp 14 -cpi new_run.cpt
Issue Description:
-
GPU Utilization: Observing only up to 50% utilization of CUDA cores on the RTX 4080 SUPER, regardless of assigning different numbers of CPU threads (e.g., 4 or 16). CPU can be load at 20-100% with no big difference in ns.
-
Performance Metrics: Achieving an average of 190 ns/day, which seems suboptimal given the hardware capabilities.
Questions:
- Configuration Optimization: What settings or parameters should I adjust to enhance GPU utilization and overall simulation performance?
- MPI and OpenMP Balance: How should I best balance the number of MPI ranks (
-ntmpi
) and OpenMP threads (-ntomp
) to fully leverage my 16-core CPU and good GPU? - GROMACS Compilation: Are there specific compilation flags or configurations that could improve performance on my hardware setup?
- System Specifics: Are there any additional system-specific optimizations (e.g., BIOS settings, OS configurations) that I should consider?
md.mdp (2.7 KB)