Difference in performance of two similar MD setups

GROMACS version:2021
GROMACS modification: No

Hi everyone,

I did two setups on CHARMM-GUI, one containing a protein+water (S1) with 70,700 atoms and the other containing a protein+water+ligand (S2) with 68,200 atoms. Thus, the systems differ by 2500 atoms. Parameters in the *.mdp file were the same for both systems. When I run the simulations using 4 GPUs, the performances are about 80 ns/day and 105 ns/day for S1 and S2, respectively. The difference in performance looks big to me due to the relatively small difference in atoms number in both simulations. The outputs for both cases are as follows:

S1:
Changing nstlist from 20 to 100, rlist from 1.224 to 1.346

Initializing Domain Decomposition on 4 ranks
Dynamic load balancing: on
Using update groups, nr 24087, average size 2.9 atoms, max. radius 0.139 nm
Minimum cell size due to atom displacement: 0.657 nm
Initial maximum distances in bonded interactions:
two-body bonded interactions: 0.420 nm, LJ-14, atoms 109 116
multi-body bonded interactions: 0.489 nm, CMAP Dih., atoms 732 753
Minimum cell size due to bonded interactions: 0.537 nm


Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
Using 1 separate PME ranks
Optimizing the DD grid for 3 cells with a minimum initial size of 0.821 nm
The maximum allowed number of cells is: X 10 Y 10 Z 10
Domain decomposition grid 3 x 1 x 1, separate PME ranks 1
PME domain decomposition: 1 x 1 x 1
Interleaving PP and PME ranks
This rank does only particle-particle work.
Domain decomposition rank 0, coordinates 0 0 0

The maximum number of communication pulses is: X 1
The minimum size for domain decomposition cells is 1.624 nm
The requested allowed shrink of DD cells (option -dds) is: 0.80
The allowed shrink of domain decomposition cells is: X 0.55
The maximum allowed distance for atom groups involved in interactions is:
non-bonded interactions 1.624 nm
two-body bonded interactions (-rdd) 1.624 nm
multi-body bonded interactions (-rdd) 1.624 nm

On host tp2 4 GPUs selected for this run.

S2:
Changing nstlist from 20 to 100, rlist from 1.224 to 1.345

Initializing Domain Decomposition on 4 ranks
Dynamic load balancing: on
Minimum cell size due to atom displacement: 0.750 nm

Initial maximum distances in bonded interactions:
two-body bonded interactions: 0.433 nm, LJ-14, atoms 1660 1667
multi-body bonded interactions: 0.484 nm, CMAP Dih., atoms 227 236
Minimum cell size due to bonded interactions: 0.532 nm
Maximum distance for 5 constraints, at 120 deg. angles, all-trans: 0.222 nm
Estimated maximum distance required for P-LINCS: 0.222 nm
Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
Using 1 separate PME ranks
Optimizing the DD grid for 3 cells with a minimum initial size of 0.938 nm
The maximum allowed number of cells is: X 9 Y 9 Z 9
Domain decomposition grid 3 x 1 x 1, separate PME ranks 1
PME domain decomposition: 1 x 1 x 1
Interleaving PP and PME ranks
This rank does only particle-particle work.
Domain decomposition rank 0, coordinates 0 0 0

The maximum number of communication pulses is: X 1
The minimum size for domain decomposition cells is 1.345 nm
The requested allowed shrink of DD cells (option -dds) is: 0.80
The allowed shrink of domain decomposition cells is: X 0.46
The maximum allowed distance for atoms involved in interactions is:
non-bonded interactions 1.345 nm
two-body bonded interactions (-rdd) 1.345 nm
multi-body bonded interactions (-rdd) 1.345 nm
atoms separated by up to 5 constraints (-rcon) 1.345 nm

On host tp2 4 GPUs selected for this run.

Is there any recommendation on how to improve the performance of S1 by using some flags with mdrun?
I already tried “… mdrun -rdd 1.345 -rcon 1.345 …” but I haven’t seen any improvement. Thanks.