GROMACS version: 2022
GROMACS modification: Yes
Dear all,
Hi, I’m using gromacs 2022 simulating a system ~1 million atoms, I use 36 threads and 1 gpu (V100) for the simulation, I notice a very large walltime on the PME mesh, could any one help give me some suggestions to improve the performance, the log file is as below:
P P - P M E L O A D B A L A N C I N G
NOTE: The PP/PME load balancing was limited by the maximum allowed grid scaling,
you might not have reached a good load balance.
PP/PME load balancing changed the cut-off and PME settings:
particle-particle PME
rcoulomb rlist grid spacing 1/beta
initial 1.200 nm 1.204 nm 192 192 192 0.116 nm 0.384 nm
final 1.858 nm 1.862 nm 120 120 120 0.186 nm 0.595 nm
cost-ratio 3.70 0.24
(note that these numbers concern only part of the total PP and PME load)
M E G A - F L O P S A C C O U N T I N G
NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
W3=SPC/TIP3p W4=TIP4p (single or pairs)
V&F=Potential and force V=Potential only F=Force only
Computing: M-Number M-Flops % Flops
-----------------------------------------------------------------------------
Pair Search distance check 133893.255360 1205039.298 0.0
NxN QSTab Elec. + LJ [F] 168582681.324288 8934882110.187 97.7
NxN QSTab Elec. + LJ [V&F] 1706313.111552 138211362.036 1.5
1,4 nonbonded interactions 865.267305 77874.057 0.0
Calc Weights 157298.545908 5662747.653 0.1
Spread Q Bspline 3355702.312704 6711404.625 0.1
Gather F Bspline 3355702.312704 20134213.876 0.2
3D-FFT 3799854.699444 30398837.596 0.3
Solve PME 741.121600 47431.782 0.0
Reset In Box 524.318000 1572.954 0.0
CG-CoM 525.366636 1576.100 0.0
Bonds 165.903318 9788.296 0.0
Propers 844.166883 193314.216 0.0
Impropers 56.151123 11679.434 0.0
Virial 5245.578906 94420.420 0.0
Stop-CM 525.366636 5253.666 0.0
Calc-Ekin 10488.457272 283188.346 0.0
Lincs 163.153263 9789.196 0.0
Lincs-Mat 870.017400 3480.070 0.0
Constraint-V 52331.446608 470983.019 0.0
Constraint-Vir 5217.768345 125226.440 0.0
Settle 17335.046694 6413967.277 0.1
CMAP 19.950399 33915.678 0.0
Urey-Bradley 596.411928 109143.383 0.0
-----------------------------------------------------------------------------
Total 9145098319.606 100.0
-----------------------------------------------------------------------------
D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
av. #atoms communicated per step for force: 2 x 533583.3
Dynamic load balancing report:
DLB was off during the run due to low measured imbalance.
Average load imbalance: 3.1%.
The balanceable part of the MD step is 64%, load imbalance is computed from this.
Part of the total run time spent waiting due to load imbalance: 2.0%.
R E A L C Y C L E A N D T I M E A C C O U N T I N G
On 6 MPI ranks, each using 6 OpenMP threads
Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
-----------------------------------------------------------------------------
Domain decomp. 6 6 500 19.629 2114.758 1.7
DD comm. load 6 6 478 0.892 96.145 0.1
Neighbor search 6 6 501 24.364 2624.960 2.1
Launch GPU ops. 6 6 100002 13.916 1499.286 1.2
Comm. coord. 6 6 49500 70.864 7634.768 6.2
Force 6 6 50001 16.685 1797.662 1.5
Wait + Comm. F 6 6 50001 51.074 5502.587 4.5
PME mesh 6 6 50001 749.587 80759.333 65.9
Wait Bonded GPU 6 6 501 0.002 0.229 0.0
Wait GPU NB nonloc. 6 6 50001 13.022 1402.967 1.1
Wait GPU NB local 6 6 50001 23.720 2555.509 2.1
NB X/F buffer ops. 6 6 199002 55.330 5961.213 4.9
Write traj. 6 6 3 0.559 60.264 0.0
Update 6 6 50001 37.599 4050.878 3.3
Constraints 6 6 50001 27.766 2991.444 2.4
Comm. energies 6 6 5001 9.232 994.592 0.8
Rest 22.950 2472.617 2.0
-----------------------------------------------------------------------------
Total 1137.191 122519.210 100.0
-----------------------------------------------------------------------------
Breakdown of PME mesh computation
-----------------------------------------------------------------------------
PME redist. X/F 6 6 100002 134.499 14490.691 11.8
PME spread 6 6 50001 332.452 35817.851 29.2
PME gather 6 6 50001 149.632 16121.109 13.2
PME 3D-FFT 6 6 100002 88.660 9552.052 7.8
PME 3D-FFT Comm. 6 6 100002 32.897 3544.313 2.9
PME solve Elec 6 6 50001 11.014 1186.582 1.0
-----------------------------------------------------------------------------
Core t (s) Wall t (s) (%)
Time: 40938.880 1137.191 3600.0
(ns/day) (hour/ns)
Performance: 7.598 3.159
Finished mdrun on rank 0 Wed Sep 28 19:30:45 2022