GROMACS version: 2016-dev-20220119-e35ae4e-unknown
GROMACS modification: No
Useful Links:
My SWM4-NDP water simulation with approx. 31 angstroms box (1024 water molecules) does not scale with the increasing number of OpenMP threads. The performance reported in Figure 1 of the paper (LINK 2) almost linearly increases with the OpenMP threads. The command that I am using to run the simulation is:
gmx mdrun -s nvt.tpr -deffnm nvt -ntomp NO_OF_THREADS
where NO_OF_THREADS were varied from 1 to 16.
Following is the performance for a 10ps NVT run using extended Lagrangian method
ntomp ns/day
1 0.586
4 0.503
8 0.486
16 0.458
I have two questions:
-
Can anyone help me with how to speed up the simulation? Some information from the output file is given below (using 16 OpenMP threads), which might be useful.
-
While running the NVT simulation, a lot of data is printed on the terminal which I do not understand. Can anyone tell me why so much data is printed and what information it provides? And how I can stop this data from printing. The data is as following:
Start: Data printed on the terminal during NVT simulation
DO FORCE: after move_f f[5115] = 32.558803 -96.048094 -35.766874
DO FORCE: after move_f f[5120] = 109.191799 129.462895 -1.605907
.
.
.
DO FORCE: after GPU use/emulate f[485] = 169.072343 125.064651 69.923393
DO FORCE: after GPU use/emulate f[490] = -62.061709 -70.705441 20.019443
.
.
.
DRUDE TFP: n = 4 final atom v[171]: 0.777366 0.136756 0.165941 drude v[175]: 0.736939 0.122672 0.083183
DRUDE TFP: n = 4 init atom v[176]: 0.475490 -0.146997 -0.235264 drude v[180] (ib = 179): 0.405482 -0.008488 -0.573738
.
.
.
VV VEL: v[3315(3315)] b4 update: -0.023910 -0.444889 0.339263
VV VEL: v[1379(1379)] after update: 0.000000 0.000000 0.000000
VV VEL: f[2673(2673)] b4 update: -459.965751 -851.022226 -705.858880
.
.
.
VV POS: x[4800(4800)] b4 update: 1.896389 0.080550 0.767094
VV POS: v[4800(4800)] b4 update: 0.010975 0.089460 0.522833
VV POS: x[4800(4800)] after update: 1.896400 0.080640 0.767617
End: Data printed on terminal during NVT simulation
Start: Information from the Output .log file
Using 1 MPI process
Using 16 OpenMP threads
NOTE: You requested 16 OpenMP threads, whereas we expect the optimum to be with more MPI ranks with 1 to 6 OpenMP threads.
Will do PME sum in reciprocal space for electrostatic interactions.
M E G A - F L O P S A C C O U N T I N G
NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
W3=SPC/TIP3p W4=TIP4p (single or pairs)
V&F=Potential and force V=Potential only F=Force only
Computing: M-Number M-Flops % Flops
Pair Search distance check 2570.456388 23134.107 0.9
NxN QSTab Elec. + LJ [F] 24407.034560 1659678.350 62.3
NxN QSTab Elec. + LJ [V&F] 500.565696 39544.690 1.5
NxN QSTab Elec. [F] 24406.435680 829818.813 31.1
NxN QSTab Elec. [V&F] 500.565696 20523.194 0.8
Calc Weights 153.615360 5530.153 0.2
Spread Q Bspline 3277.127680 6554.255 0.2
Gather F Bspline 3277.127680 19662.766 0.7
3D-FFT 6332.493186 50659.945 1.9
Solve PME 7.840784 501.810 0.0
Shift-X 5.125120 30.751 0.0
Bonds 10.241024 604.220 0.0
Virial 1.038165 18.687 0.0
Stop-CM 0.522240 5.222 0.0
Calc-Ekin 51.205120 1382.538 0.1
Constraint-V 30.723072 245.785 0.0
Constraint-Vir 0.617472 14.819 0.0
Settle 20.484096 6616.363 0.2
Virtual Site 3 10.446848 386.533 0.0
Total 2664913.003 100.0
R E A L C Y C L E A N D T I M E A C C O U N T I N G
On 1 MPI rank, each using 16 OpenMP threads
Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
Vsite constr. 1 16 10001 0.326 11.743 0.0
Neighbor search 1 16 1001 2.309 83.131 0.1
Force 1 16 10001 96.011 3456.394 5.1
PME mesh 1 16 10001 4.332 155.958 0.2
NB X/F buffer ops. 1 16 19001 1.198 43.118 0.1
Vsite spread 1 16 10202 0.510 18.367 0.0
Write traj. 1 16 13 1.503 54.110 0.1
Update 1 16 40004 1628.047 58609.655 86.4
Constraints 1 16 20002 1.549 55.754 0.1
Rest 149.617 5386.224 7.9
Total 1885.402 67874.453 100.0
Breakdown of PME mesh computation
PME spread/gather 1 16 20002 2.955 106.380 0.2
PME 3D-FFT 1 16 20002 1.244 44.780 0.1
PME solve Elec 1 16 10001 0.086 3.107 0.0
End: Information from the Output .log file
Start: GROMACS and simulation details are:
The GROMACS drude version was downloaded using the link: LINK 1 (see above)
.mdp file: same as provided in the supporting information of the paper LINK 2 (see above)
coordinate file (.gro file): pre-equilibrated box of SWM4-NDP water downloaded from LINK 1 (see above)
Below is the SWM4-NDP topology file that I am using:
;
; Polarizable water: SWM4-NDP model
;
; G. Lamoureux, E. Harder, I. V. Vorobyov, B. Roux, and A. D. MacKerell, Jr. (2006)
; A polarizable model of water for molecular dynamics simulations of biomolecules.
; Chem. Phys. Lett. 418: 245-249.
;
[ defaults ]
; nbfunc comb-rule gen-pairs fudgeLJ fudgeQQ
1 2 no 0.5000 0.5000
[ atomtypes ]
;type atnum mass charge ptype sigma epsilon
ODW 8 15.599400 0.000 A 0.318394549320 0.88259
HDW 1 1.008000 0.000 A 0.000000000000 0.00000
LPDW 1 0.000000 0.000 V 0.000000000000 0.00000
DOH2 1 0.400000 0.000 S 0.000000000000 0.00000
[ moleculetype ]
; molname nrexcl
SOL 2
[ atoms ]
; id type resnr resname at name cg nr charge mass
1 ODW 1 SOL OH2 1 1.71636 15.5994
2 HDW 1 SOL H1 1 0.55733 1.0080
3 HDW 1 SOL H2 1 0.55733 1.0080
4 LPDW 1 SOL OM 1 -1.11466 0.0000
5 DOH2 1 SOL DOH2 1 -1.71636 0.4000
[ bonds ]
;; i j funct
1 5 1 0.00000000 418400.00
[ virtual_sites3 ]
; site from func a b
4 1 2 3 1 0.205109464 0.205109464
[ settles ]
1 1 0.09572 0.15139
[ exclusions ]
1 2 3 4 5
2 1 3 4 5
3 1 2 4 5
4 1 2 3 5
[ system ]
SOL
[ molecules ]
; Compound nmols
SOL 1024
End: GROMACS and simulation details are: