GROMACS 2019.6 job running very slow (5 ns/day) with GPU SLURM script. How to improve performance?

Hi all,

I’m running GROMACS (2019.6, GPU build) on my HPC cluster (gpunode002: 24 core, 256GB ram, gpu NVidia Tesla P40) with the following SLURM script:

#!/bin/bash
#SBATCH --job-name=WT
#SBATCH --partition=gpuq
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=20
#SBATCH --mem=200G
#SBATCH --gres=gpu:1
#SBATCH --nodelist=gpunode002
#SBATCH --output=gmx.%j.out
#SBATCH --error=gmx.%j.err

set -euo pipefail

module purge
module load cuda80/toolkit/8.0.61
module load cuda80/fft/8.0.61
module load gromacs/single/gpu/2019.6

export CUDA_ROOT=${CUDA_HOME:-${CUDA_PATH:-$(dirname "$(dirname "$(which nvcc)")")}}
export LD_LIBRARY_PATH="$CUDA_ROOT/lib64:${LD_LIBRARY_PATH:-}"
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

cd "$SLURM_SUBMIT_DIR"

# Production MD
gmx grompp -f md.mdp -c npt.gro -t npt.cpt -p topol.top -o md_0_1.tpr -maxwarn 10
gmx mdrun -deffnm md_0_1 -ntmpi 1 -ntomp 20 -nb gpu -pme gpu -pin on -dlb auto -gpu_id 0

But the performance is only ~5 ns/day, which feels extremely low for a GPU job. Can you please review the commands for best performance? Thank you!

Hi, sorry for the late reply. Are you sure you’re not just missing the srun command before gmx mdrun (and grompp)? Without that, you’ll be running on the login node instead of the compute node on most clusters.