Optimizing simulation speed for 2 million atoms

zaidi1 · April 1, 2024, 12:38pm

GROMACS version: 2024.1
GROMACS modification: Yes/No
Dear experts and gromacs users
I am trying to optimize the simulation speed for my biological system comprising some 2 million atoms. I am running on a single node which has 3 accelerators. With 3 mpi ranks (3 gpus) and 24 threads for each rank and offloading NB, BB, PME (1 rank/ 1 gpu) and update to gpus I get close to 9 ns/day. I have enabled direct communication between GPUs. Is there anything else I am missing and is there any way I could further optimize the speed?

MagnusL · April 2, 2024, 6:34am

What communication is used between the GPUs? Sometimes it’s quicker to just run on one GPU.

zaidi1 · April 2, 2024, 11:15am

Thanks, @MagnusL, for the reply. So there is this GPU direct communication with CUDA-aware MPI enabled in my simulations. Currently, I am getting 9 ns/day speed for 3 MPI ranks (2(pp)+1(pme)). I can use 3 GPUs only, that’s why this weird MPI rank distributions. I see severe degradation of speed as I increase the mpi ranks with #gpus=3. I have tried with 1+1 (pp+pme) but it is not better then 2+1 mpi ranks. I am offloading NB,BB,update to GPUs.

scinikhil · April 2, 2024, 10:56pm

3 gpu is in single node? i found better performance in single node with threadmpi. Also, the best of best was on single gpu

Topic		Replies	Views
GROMACS performance on 8 cores workstation User discussions mdrun , mdrun-performance	8	1658	February 20, 2024
Performance loss User discussions	2	1239	February 20, 2021
Abysmal MD production performance on GPU node User discussions mdrun	8	885	December 15, 2023
Optimizing GPU performance for GROMACS? User discussions	6	1416	January 13, 2021
Optimizing Production Run with GPUs and CHARMM36 User discussions gpu , mdrun-performance , simulation-setup	3	59	February 27, 2025

Optimizing simulation speed for 2 million atoms

Related topics