Attached below is the single node performance
:-) GROMACS - gmx mdrun, 2020.4 (-:
GROMACS is written by:
Emile Apol Rossen Apostolov Paul Bauer Herman J.C. Berendsen
Par Bjelkmar Christian Blau Viacheslav Bolnykh Kevin Boyd
Aldert van Buuren Rudi van Drunen Anton Feenstra Alan Gray
Gerrit Groenhof Anca Hamuraru Vincent Hindriksen M. Eric Irrgang
Aleksei Iupinov Christoph Junghans Joe Jordan Dimitrios Karkoulis
Peter Kasson Jiri Kraus Carsten Kutzner Per Larsson
Justin A. Lemkul Viveca Lindahl Magnus Lundborg Erik Marklund
Pascal Merz Pieter Meulenhoff Teemu Murtola Szilard Pall
Sander Pronk Roland Schulz Michael Shirts Alexey Shvetsov
Alfons Sijbers Peter Tieleman Jon Vincent Teemu Virolainen
Christian Wennberg Maarten Wolf Artem Zhmurov
and the project leaders:
Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2019, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.
GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.
GROMACS: gmx mdrun, version 2020.4
Executable: /work/opt/local/apps/intel/2019.5.281/impi/2019.5.281/gromacs/2020.4/bin/gmx_mpi
Data prefix: /work/opt/local/apps/intel/2019.5.281/impi/2019.5.281/gromacs/2020.4
Working dir: /work/2/hp210295/u18000/test
Process ID: 39745
Command line:
gmx_mpi mdrun -deffnm 6
GROMACS version: 2020.4
Verified release checksum is 79c2857291b034542c26e90512b92fd4b184a1c9d6fa59c55f2e24ccf14e7281
Precision: single
Memory model: 64 bit
MPI library: MPI
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support: disabled
SIMD instructions: AVX_512_KNL
FFT library: fftw-3.3.8-avx-avx2-avx2_128-avx512
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: hwloc-1.11.5rc2
Tracing support: disabled
C compiler: /opt/intel/impi/2019.5.281/intel64/bin/mpiicc Intel 19.0.5.20190815
C compiler flags: -xMIC-AVX512 -std=gnu99 -ip -funroll-all-loops -alias-const -ansi-alias -no-prec-div -fimf-domain-exclusion=14 -qoverride-limits -O3 -DNDEBUG
C++ compiler: /opt/intel/impi/2019.5.281/intel64/bin/mpiicpc Intel 19.0.5.20190815
C++ compiler flags: -xMIC-AVX512 -ip -funroll-all-loops -alias-const -ansi-alias -no-prec-div -fimf-domain-exclusion=14 -qoverride-limits -qopenmp -O3 -DNDEBUG
Running on 1 node with total 68 cores, 272 logical cores
Hardware detected on host c0253.ofp (the node of MPI rank 0):
CPU info:
Vendor: Intel
Brand: Intel(R) Xeon Phi(TM) CPU 7250 @ 1.40GHz
Family: 6 Model: 87 Stepping: 1
Features: aes apic avx avx2 avx512f avx512pf avx512er avx512cd clfsh cmov cx8 cx16 f16c fma htt intel lahf mmx msr nonstop_tsc pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
Number of AVX-512 FMA units: Cannot run AVX-512 detection - assuming 2
Hardware topology: Basic
Sockets, cores, and logical processors:
Socket 0: [ 0 68 136 204] [ 1 69 137 205] [ 2 70 138 206] [ 3 71 139 207] [ 4 72 140 208] [ 5 73 141 209] [ 6 74 142 210] [ 7 75 143 211] [ 8 76 144 212] [ 9 77 145 213] [ 10 78 146 214] [ 11 79 147 215] [ 12 80 148 216] [ 13 81 149 217] [ 14 82 150 218] [ 15 83 151 219] [ 16 84 152 220] [ 17 85 153 221] [ 18 86 154 222] [ 19 87 155 223] [ 20 88 156 224] [ 21 89 157 225] [ 22 90 158 226] [ 23 91 159 227] [ 24 92 160 228] [ 25 93 161 229] [ 26 94 162 230] [ 27 95 163 231] [ 28 96 164 232] [ 29 97 165 233] [ 30 98 166 234] [ 31 99 167 235] [ 32 100 168 236] [ 33 101 169 237] [ 34 102 170 238] [ 35 103 171 239] [ 36 104 172 240] [ 37 105 173 241] [ 38 106 174 242] [ 39 107 175 243] [ 40 108 176 244] [ 41 109 177 245] [ 42 110 178 246] [ 43 111 179 247] [ 44 112 180 248] [ 45 113 181 249] [ 46 114 182 250] [ 47 115 183 251] [ 48 116 184 252] [ 49 117 185 253] [ 50 118 186 254] [ 51 119 187 255] [ 52 120 188 256] [ 53 121 189 257] [ 54 122 190 258] [ 55 123 191 259] [ 56 124 192 260] [ 57 125 193 261] [ 58 126 194 262] [ 59 127 195 263] [ 60 128 196 264] [ 61 129 197 265] [ 62 130 198 266] [ 63 131 199 267] [ 64 132 200 268] [ 65 133 201 269] [ 66 134 202 270] [ 67 135 203 271]
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E.
Lindahl
GROMACS: High performance molecular simulations through multi-level
parallelism from laptops to supercomputers
SoftwareX 1 (2015) pp. 19-25
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Páll, M. J. Abraham, C. Kutzner, B. Hess, E. Lindahl
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with
GROMACS
In S. Markidis & E. Laure (Eds.), Solving Software Challenges for Exascale 8759 (2015) pp. 3-27
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R.
Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E. Lindahl
GROMACS 4.5: a high-throughput and highly parallel open source molecular
simulation toolkit
Bioinformatics 29 (2013) pp. 845-54
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 435-447
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
Berendsen
GROMACS: Fast, Flexible and Free
J. Comp. Chem. 26 (2005) pp. 1701-1719
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
E. Lindahl and B. Hess and D. van der Spoel
GROMACS 3.0: A package for molecular simulation and trajectory analysis
J. Mol. Mod. 7 (2001) pp. 306-317
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp. Phys. Comm. 91 (1995) pp. 43-56
-------- -------- --- Thank You --- -------- --------
++++ PLEASE CITE THE DOI FOR THIS VERSION OF GROMACS ++++
https://doi.org/10.5281/zenodo.4054979
-------- -------- --- Thank You --- -------- --------
Input Parameters:
integrator = md
tinit = 0
dt = 0.002
nsteps = 50000000
init-step = 0
simulation-part = 1
comm-mode = Linear
nstcomm = 100
bd-fric = 0
ld-seed = -1642179826
emtol = 10
emstep = 0.01
niter = 20
fcstep = 0
nstcgsteep = 1000
nbfgscorr = 10
rtpi = 0.05
nstxout = 0
nstvout = 0
nstfout = 0
nstlog = 5000
nstcalcenergy = 100
nstenergy = 5000
nstxout-compressed = 5000
compressed-x-precision = 1000
cutoff-scheme = Verlet
nstlist = 20
pbc = xyz
periodic-molecules = false
verlet-buffer-tolerance = 0.005
rlist = 1.222
coulombtype = PME
coulomb-modifier = Potential-shift
rcoulomb-switch = 0
rcoulomb = 1.2
epsilon-r = 1
epsilon-rf = inf
vdw-type = Cut-off
vdw-modifier = Force-switch
rvdw-switch = 1
rvdw = 1.2
DispCorr = No
table-extension = 1
fourierspacing = 0.12
fourier-nx = 72
fourier-ny = 72
fourier-nz = 72
pme-order = 4
ewald-rtol = 1e-05
ewald-rtol-lj = 0.001
lj-pme-comb-rule = Geometric
ewald-geometry = 0
epsilon-surface = 0
tcoupl = Nose-Hoover
nsttcouple = 20
nh-chain-length = 1
print-nose-hoover-chain-variables = false
pcoupl = Parrinello-Rahman
pcoupltype = Isotropic
nstpcouple = 20
tau-p = 5
compressibility (3x3):
compressibility[ 0]={ 4.50000e-05, 0.00000e+00, 0.00000e+00}
compressibility[ 1]={ 0.00000e+00, 4.50000e-05, 0.00000e+00}
compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 4.50000e-05}
ref-p (3x3):
ref-p[ 0]={ 1.00000e+00, 0.00000e+00, 0.00000e+00}
ref-p[ 1]={ 0.00000e+00, 1.00000e+00, 0.00000e+00}
ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e+00}
refcoord-scaling = COM
posres-com (3):
posres-com[0]= 0.00000e+00
posres-com[1]= 0.00000e+00
posres-com[2]= 0.00000e+00
posres-comB (3):
posres-comB[0]= 0.00000e+00
posres-comB[1]= 0.00000e+00
posres-comB[2]= 0.00000e+00
QMMM = false
QMconstraints = 0
QMMMscheme = 0
MMChargeScaleFactor = 1
qm-opts:
ngQM = 0
constraint-algorithm = Lincs
continuation = true
Shake-SOR = false
shake-tol = 0.0001
lincs-order = 4
lincs-iter = 1
lincs-warnangle = 30
nwall = 0
wall-type = 9-3
wall-r-linpot = -1
wall-atomtype[0] = -1
wall-atomtype[1] = -1
wall-density[0] = 0
wall-density[1] = 0
wall-ewald-zfac = 3
pull = false
awh = false
rotation = false
interactiveMD = false
disre = No
disre-weighting = Conservative
disre-mixed = false
dr-fc = 1000
dr-tau = 0
nstdisreout = 100
orire-fc = 0
orire-tau = 0
nstorireout = 100
free-energy = no
cos-acceleration = 0
deform (3x3):
deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
simulated-tempering = false
swapcoords = no
userint1 = 0
userint2 = 0
userint3 = 0
userint4 = 0
userreal1 = 0
userreal2 = 0
userreal3 = 0
userreal4 = 0
applied-forces:
electric-field:
x:
E0 = 0
omega = 0
t0 = 0
sigma = 0
y:
E0 = 0
omega = 0
t0 = 0
sigma = 0
z:
E0 = 0
omega = 0
t0 = 0
sigma = 0
density-guided-simulation:
active = false
group = protein
similarity-measure = inner-product
atom-spreading-weight = unity
force-constant = 1e+09
gaussian-transform-spreading-width = 0.2
gaussian-transform-spreading-range-in-multiples-of-width = 4
reference-density-filename = reference.mrc
nst = 1
normalize-densities = true
adaptive-force-scaling = false
adaptive-force-scaling-time-constant = 4
grpopts:
nrdf: 114689
ref-t: 310
tau-t: 1
annealing: No
annealing-npoints: 0
acc: 0 0 0
nfreeze: N N N
energygrp-flags[ 0]: 0
Changing nstlist from 20 to 100, rlist from 1.222 to 1.342
Initializing Domain Decomposition on 64 ranks
Dynamic load balancing: auto
Using update groups, nr 19464, average size 2.9 atoms, max. radius 0.139 nm
Minimum cell size due to atom displacement: 0.666 nm
Initial maximum distances in bonded interactions:
two-body bonded interactions: 0.470 nm, LJ-14, atoms 3436 3931
multi-body bonded interactions: 0.499 nm, CMAP Dih., atoms 654 666
Minimum cell size due to bonded interactions: 0.548 nm
Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
Guess for relative PME load: 0.19
Will use 48 particle-particle and 16 PME only ranks
This is a guess, check the performance at the end of the log file
Using 16 separate PME ranks, as guessed by mdrun
Optimizing the DD grid for 48 cells with a minimum initial size of 0.832 nm
The maximum allowed number of cells is: X 10 Y 10 Z 10
Domain decomposition grid 4 x 4 x 3, separate PME ranks 16
PME domain decomposition: 4 x 4 x 1
Interleaving PP and PME ranks
This rank does only particle-particle work.
Domain decomposition rank 0, coordinates 0 0 0
The initial number of communication pulses is: X 1 Y 1 Z 1
The initial domain decomposition cell size is: X 2.10 nm Y 2.10 nm Z 2.80 nm
The maximum allowed distance for atom groups involved in interactions is:
non-bonded interactions 1.620 nm
(the following are initial values, they could change due to box deformation)
two-body bonded interactions (-rdd) 1.620 nm
multi-body bonded interactions (-rdd) 1.620 nm
When dynamic load balancing gets turned on, these settings will change to:
The maximum number of communication pulses is: X 1 Y 1 Z 1
The minimum size for domain decomposition cells is 1.620 nm
The requested allowed shrink of DD cells (option -dds) is: 0.80
The allowed shrink of domain decomposition cells is: X 0.77 Y 0.77 Z 0.58
The maximum allowed distance for atom groups involved in interactions is:
non-bonded interactions 1.620 nm
two-body bonded interactions (-rdd) 1.620 nm
multi-body bonded interactions (-rdd) 1.620 nm
Using 64 MPI processes
Non-default thread affinity set, disabling internal thread affinity
Using 4 OpenMP threads per MPI process
System total charge: 0.000
Will do PME sum in reciprocal space for electrostatic interactions.
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen
A smooth particle mesh Ewald method
J. Chem. Phys. 103 (1995) pp. 8577-8592
-------- -------- --- Thank You --- -------- --------
Using a Gaussian width (1/beta) of 0.384195 nm for Ewald
Potential shift: LJ r^-12: -2.648e-01 r^-6: -5.349e-01, Ewald -8.333e-06
Initialized non-bonded Ewald tables, spacing: 1.02e-03 size: 1176
Generated table with 1171 data points for 1-4 COUL.
Tabscale = 500 points/nm
Generated table with 1171 data points for 1-4 LJ6.
Tabscale = 500 points/nm
Generated table with 1171 data points for 1-4 LJ12.
Tabscale = 500 points/nm
Using SIMD 4x8 nonbonded short-range kernels
Using a dual 4x8 pair-list setup updated with dynamic pruning:
outer list: updated every 100 steps, buffer 0.142 nm, rlist 1.342 nm
inner list: updated every 13 steps, buffer 0.001 nm, rlist 1.201 nm
At tolerance 0.005 kJ/mol/ps per atom, equivalent classical 1x1 list would be:
outer list: updated every 100 steps, buffer 0.296 nm, rlist 1.496 nm
inner list: updated every 13 steps, buffer 0.052 nm, rlist 1.252 nm
Initializing LINear Constraint Solver
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and H. Bekker and H. J. C. Berendsen and J. G. E. M. Fraaije
LINCS: A Linear Constraint Solver for molecular simulations
J. Comp. Chem. 18 (1997) pp. 1463-1472
-------- -------- --- Thank You --- -------- --------
The number of constraints is 2047
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Miyamoto and P. A. Kollman
SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
Water Models
J. Comp. Chem. 13 (1992) pp. 952-962
-------- -------- --- Thank You --- -------- --------
Linking all bonded interactions to atoms
Intra-simulation communication will occur every 20 steps.
There are: 56315 Atoms
Atom distribution over 48 domains: av 1173 stddev 46 min 1110 max 1323
Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
0: System
Started mdrun on rank 0 Tue Jan 11 16:31:21 2022
Step Time
0 0.00000
Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
3.71683e+03 1.09179e+04 1.20019e+04 7.06692e+02 -6.36002e+02
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
2.73344e+03 3.72157e+04 6.66096e+04 -8.51480e+05 3.12529e+03
Potential Kinetic En. Total Energy Conserved En. Temperature
-7.15089e+05 1.49243e+05 -5.65846e+05 -5.65810e+05 3.13017e+02
Pressure (bar) Constr. rmsd
-4.37170e+02 3.68922e-06
DD step 99 load imb.: force 22.5% pme mesh/force 3.184
step 600: timed with pme grid 72 72 72, coulomb cutoff 1.200: 503.2 M-cycles
step 800: timed with pme grid 60 60 60, coulomb cutoff 1.400: 524.1 M-cycles
step 1000: timed with pme grid 52 52 52, coulomb cutoff 1.615: 684.2 M-cycles
step 1200: timed with pme grid 56 56 56, coulomb cutoff 1.500: 606.1 M-cycles
step 1400: timed with pme grid 60 60 60, coulomb cutoff 1.400: 528.0 M-cycles
step 1600: timed with pme grid 64 64 64, coulomb cutoff 1.313: 460.5 M-cycles
step 1800: timed with pme grid 72 72 72, coulomb cutoff 1.200: 486.3 M-cycles
step 2000: timed with pme grid 64 64 64, coulomb cutoff 1.313: 469.8 M-cycles
step 2200: timed with pme grid 72 72 72, coulomb cutoff 1.200: 497.6 M-cycles
optimal pme grid 64 64 64, coulomb cutoff 1.313
DD step 4999 load imb.: force 8.8% pme mesh/force 1.014
Step Time
5000 10.00000
Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
3.59132e+03 1.05560e+04 1.15225e+04 6.60512e+02 -5.45052e+02
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
2.62548e+03 3.72929e+04 6.86834e+04 -8.59544e+05 2.07033e+03
Potential Kinetic En. Total Energy Conserved En. Temperature
-7.23086e+05 1.48147e+05 -5.74940e+05 -5.65841e+05 3.10718e+02
Pressure (bar) Constr. rmsd
1.36763e+02 3.09292e-06
DD step 9999 load imb.: force 8.1% pme mesh/force 1.007
Step Time
10000 20.00000
Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
3.50481e+03 1.04389e+04 1.13793e+04 6.51941e+02 -5.48826e+02
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
2.61028e+03 3.72691e+04 6.82935e+04 -8.63144e+05 2.01637e+03
Potential Kinetic En. Total Energy Conserved En. Temperature
-7.27528e+05 1.46082e+05 -5.81446e+05 -5.65470e+05 3.06388e+02
Pressure (bar) Constr. rmsd
1.16957e+01 2.98622e-06
Received the TERM signal, stopping within 100 steps
Step Time
14700 29.40000
Writing checkpoint, step 14700 at Tue Jan 11 16:32:41 2022
Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
3.64271e+03 1.05973e+04 1.12037e+04 6.14280e+02 -5.71026e+02
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
2.55995e+03 3.71370e+04 6.87223e+04 -8.63461e+05 2.03921e+03
Potential Kinetic En. Total Energy Conserved En. Temperature
-7.27515e+05 1.48024e+05 -5.79491e+05 -5.65332e+05 3.10460e+02
Pressure (bar) Constr. rmsd
7.06370e+01 3.13095e-06
<====== ############### ==>
<==== A V E R A G E S ====>
<== ############### ======>
Statistics over 14701 steps using 148 frames
Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
3.65743e+03 1.04505e+04 1.14343e+04 6.48923e+02 -5.53127e+02
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
2.59964e+03 3.73598e+04 6.84031e+04 -8.61622e+05 2.07372e+03
Potential Kinetic En. Total Energy Conserved En. Temperature
-7.25548e+05 1.47890e+05 -5.77658e+05 -5.65614e+05 3.10178e+02
Pressure (bar) Constr. rmsd
-1.91684e+00 0.00000e+00
Box-X Box-Y Box-Z
8.24066e+00 8.24066e+00 8.24066e+00
Total Virial (kJ/mol)
4.92424e+04 6.43378e+01 2.64065e+02
6.68609e+01 4.96511e+04 -1.88036e+02
2.68684e+02 -1.84937e+02 4.92011e+04
Pressure (bar)
4.31721e+00 -2.09554e+00 -1.44692e+01
-2.24871e+00 -1.79220e+01 1.13233e+01
-1.47432e+01 1.11393e+01 7.85428e+00
P P - P M E L O A D B A L A N C I N G
PP/PME load balancing changed the cut-off and PME settings:
particle-particle PME
rcoulomb rlist grid spacing 1/beta
initial 1.200 nm 1.201 nm 72 72 72 0.117 nm 0.384 nm
final 1.313 nm 1.314 nm 64 64 64 0.131 nm 0.420 nm
cost-ratio 1.31 0.70
(note that these numbers concern only part of the total PP and PME load)
M E G A - F L O P S A C C O U N T I N G
NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
W3=SPC/TIP3p W4=TIP4p (single or pairs)
V&F=Potential and force V=Potential only F=Force only
Computing: M-Number M-Flops % Flops
-----------------------------------------------------------------------------
Pair Search distance check 5668.050006 51012.450 0.1
NxN Ewald Elec. + LJ [F] 962883.301248 75104897.497 94.6
NxN Ewald Elec. + LJ [V&F] 9792.196000 1263193.284 1.6
NxN LJ [F] 7.628544 343.284 0.0
NxN LJ [V&F] 0.077056 5.009 0.0
NxN Ewald Elec. [F] 18005.907744 1098360.372 1.4
NxN Ewald Elec. [V&F] 183.107328 15381.016 0.0
1,4 nonbonded interactions 158.315069 14248.356 0.0
Calc Weights 2483.660445 89411.776 0.1
Spread Q Bspline 52984.756160 105969.512 0.1
Gather F Bspline 52984.756160 317908.537 0.4
3D-FFT 140814.384784 1126515.078 1.4
Solve PME 242.537984 15522.431 0.0
Reset In Box 8.278305 24.835 0.0
CG-CoM 8.334620 25.004 0.0
Bonds 30.827997 1818.852 0.0
Propers 152.934503 35022.001 0.0
Impropers 9.981979 2076.252 0.0
Virial 43.037600 774.677 0.0
Stop-CM 8.334620 83.346 0.0
Calc-Ekin 82.895680 2238.183 0.0
Lincs 30.092947 1805.577 0.0
Lincs-Mat 160.005684 640.023 0.0
Constraint-V 827.666300 6621.330 0.0
Constraint-Vir 39.930208 958.325 0.0
Settle 255.826802 82632.057 0.1
CMAP 3.939868 6697.776 0.0
Urey-Bradley 109.948779 20120.627 0.0
-----------------------------------------------------------------------------
Total 79364307.467 100.0
-----------------------------------------------------------------------------
D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
av. #atoms communicated per step for force: 2 x 151814.8
Dynamic load balancing report:
DLB was off during the run due to low measured imbalance.
Average load imbalance: 14.8%.
The balanceable part of the MD step is 22%, load imbalance is computed from this.
Part of the total run time spent waiting due to load imbalance: 3.3%.
Average PME mesh/force load: 1.669
Part of the total run time spent waiting due to PP/PME imbalance: 11.9 %
NOTE: 11.9 % performance was lost because the PME ranks
had more work to do than the PP ranks.
You might want to increase the number of PME ranks
or increase the cut-off and the grid spacing.
R E A L C Y C L E A N D T I M E A C C O U N T I N G
On 48 MPI ranks doing PP, each using 4 OpenMP threads, and
on 16 MPI ranks doing PME, each using 4 OpenMP threads
Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
-----------------------------------------------------------------------------
Domain decomp. 48 4 147 0.729 196.056 0.7
DD comm. load 48 4 4 0.001 0.299 0.0
Send X to PME 48 4 14701 0.288 77.533 0.3
Neighbor search 48 4 148 1.197 321.746 1.1
Comm. coord. 48 4 14553 5.178 1391.771 4.9
Force 48 4 14701 43.968 11818.152 41.3
Wait + Comm. F 48 4 14701 7.199 1935.031 6.8
PME mesh * 16 4 14701 56.152 5031.034 17.6
PME wait for PP * 20.387 1826.611 6.4
Wait + Recv. PME F 48 4 14701 15.359 4128.482 14.4
NB X/F buffer ops. 48 4 43807 2.704 726.750 2.5
Write traj. 48 4 4 0.030 7.981 0.0
Update 48 4 14701 0.804 216.067 0.8
Constraints 48 4 14701 1.129 303.559 1.1
Comm. energies 48 4 736 1.211 325.507 1.1
Rest 0.021 5.701 0.0
-----------------------------------------------------------------------------
Total 79.819 28606.180 100.0
-----------------------------------------------------------------------------
(*) Note that with separate PME ranks, the walltime column actually sums to
twice the total reported, but the cycle count total and % are correct.
-----------------------------------------------------------------------------
Breakdown of PME mesh computation
-----------------------------------------------------------------------------
PME redist. X/F 16 4 29402 16.948 1518.472 5.3
PME spread 16 4 14701 12.011 1076.144 3.8
PME gather 16 4 14701 8.699 779.402 2.7
PME 3D-FFT 16 4 29402 10.534 943.824 3.3
PME 3D-FFT Comm. 16 4 58804 6.875 615.991 2.2
PME solve Elec 16 4 14701 0.586 52.468 0.2
-----------------------------------------------------------------------------
Core t (s) Wall t (s) (%)
Time: 20431.170 79.819 25597.0
(ns/day) (hour/ns)
Performance: 31.826 0.754
Finished mdrun on rank 0 Tue Jan 11 16:32:42 2022