Efficient parallelization schme

pszilard · June 7, 2021, 12:27pm

Hi,

As You are trying to simulate a very small system (~20500 atoms), so you should not expect that to scale well to more than ~100 cores, and with a fast network interconnect and possibly some careful parameter tuning, perhaps to 200 cores or so.

A few pointers:

What kind of network interconnect are you using? Make sure it is not Ethernet.
Make sure to set process or thread affinities either with your MPI launcher or with mdrun -pin on
Your runs have major PP-PME imbalance, likely because PME does not seem to scale well; this can be due to either of the above (or possibly other reasons too). Rule our the first two, then focus on improving the PME time
Consider using more PME ranks, or possibly PME order 5.

Cheers,
Szilárd

Topic		Replies	Views
Parallel run User discussions	4	467	December 25, 2023
Parallelization User discussions	9	837	October 16, 2020
Does anyone have a good set of mpirun/gmx options for large MPI/GPU jobs? User discussions mdrun	1	1209	August 30, 2022
Efficiency on running multiple tasks on 1 gpu node User discussions mdrun	7	644	July 6, 2023
GROMACS performance on 8 cores workstation User discussions mdrun , mdrun-performance	8	1670	February 20, 2024

Efficient parallelization schme

Related topics