GROMACS version: 2020.5
GROMACS modification: No
Hi!
I’m trying to perform a simulation containing ~450,000 TIP4P/ice water molecules on GPU (more specifically, it’s an ice slab with half of the molecules frozen; and a vacuum layer exists above the surface). I want to check out the performance so I run this:
gmx_mpi -quiet mdrun -nsteps 10000 -resethway -noconfout
The performance turns out not very satisfying, and when checking md.log I found that the “rest” part is taking ~20% of total time (see the log below). If I run on CPU only the “rest” time occupies <1%. I wonder if this is normal and how can I accelerate the simulation (nvidia-smi shows that the GPU usage is around 0%~50% so I think it might be possible?)
The simulation is performed on a E5-2640 v4 CPU (with 5 cores available) and a Tesla V100 GPU (the cluster I’m using forces a 5:1 CPU/GPU ratio). The md.log is provided below.
Thanks in advance!
:-) GROMACS - gmx mdrun, 2020.5-dev-UNCHECKED (-:
GROMACS is written by:
Emile Apol Rossen Apostolov Paul Bauer Herman J.C. Berendsen
Par Bjelkmar Christian Blau Viacheslav Bolnykh Kevin Boyd
Aldert van Buuren Rudi van Drunen Anton Feenstra Alan Gray
Gerrit Groenhof Anca Hamuraru Vincent Hindriksen M. Eric Irrgang
Aleksei Iupinov Christoph Junghans Joe Jordan Dimitrios Karkoulis
Peter Kasson Jiri Kraus Carsten Kutzner Per Larsson
Justin A. Lemkul Viveca Lindahl Magnus Lundborg Erik Marklund
Pascal Merz Pieter Meulenhoff Teemu Murtola Szilard Pall
Sander Pronk Roland Schulz Michael Shirts Alexey Shvetsov
Alfons Sijbers Peter Tieleman Jon Vincent Teemu Virolainen
Christian Wennberg Maarten Wolf Artem Zhmurov
and the project leaders:
Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2019, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.
GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.
GROMACS: gmx mdrun, version 2020.5-dev-UNCHECKED
Executable: ***********/software/gromacs/gromacs-2020/bin/gmx_mpi
Data prefix: ***********/software/gromacs/gromacs-2020
Working dir: *********** (I guess the path is irrelevant)
Process ID: 9114
Command line:
gmx_mpi -quiet mdrun -nsteps 10000 -resethway -noconfout
GROMACS version: 2020.5-dev-UNCHECKED
The source code this program was compiled from has not been verified because the reference checksum was missing during compilation. This means you have an incomplete GROMACS distribution, please make sure to download an intact source distribution and compile that before proceeding.
Computed checksum: NoChecksumFile
Precision: single
Memory model: 64 bit
MPI library: MPI
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support: CUDA
SIMD instructions: AVX2_256
FFT library: fftw-3.3.3-sse2
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: hwloc-1.11.8
Tracing support: disabled
C compiler: /software/intel/parallelstudio/2017u8/compilers_and_libraries_2017.8.262/linux/mpi/intel64/bin/mpicc GNU 6.3.1
C compiler flags: -mavx2 -mfma -Wall -Wno-unused -Wunused-value -Wunused-parameter -Wextra -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -Wundef -fexcess-precision=fast -funroll-all-loops -Wno-array-bounds -O3 -DNDEBUG
C++ compiler: /software/intel/parallelstudio/2017u8/compilers_and_libraries_2017.8.262/linux/mpi/intel64/bin/mpicxx GNU 6.3.1
C++ compiler flags: -mavx2 -mfma -Wall -Wextra -Wno-missing-field-initializers -Wpointer-arith -Wmissing-declarations -Wundef -fexcess-precision=fast -funroll-all-loops -Wno-array-bounds -fopenmp -O3 -DNDEBUG
CUDA compiler: /software/nvidia/cuda/10.0/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2018 NVIDIA Corporation;Built on Sat_Aug_25_21:08:01_CDT_2018;Cuda compilation tools, release 10.0, V10.0.130
CUDA compiler flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_35,code=compute_35;-gencode;arch=compute_50,code=compute_50;-gencode;arch=compute_52,code=compute_52;-gencode;arch=compute_60,code=compute_60;-gencode;arch=compute_61,code=compute_61;-gencode;arch=compute_70,code=compute_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;; -O3 -DNDEBUG
CUDA driver: 11.40
CUDA runtime: 10.0
Note: 20 CPUs configured, but only 5 were detected to be online.
Running on 1 node with total 5 cores, 5 logical cores, 1 compatible GPU
Hardware detected on host g0018 (the node of MPI rank 0):
CPU info:
Vendor: Intel
Brand: Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz
Family: 6 Model: 79 Stepping: 1
Features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma hle htt intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
Hardware topology: Full, with devices
Sockets, cores, and logical processors:
Socket 0: [ 0] [ 1] [ 2] [ 3] [ 4]
Numa nodes:
Node 0 (68602982400 bytes mem): 0 1 2 3 4
Node 1 (68719476736 bytes mem):
Latency:
0 1
0 1.00 2.10
1 2.10 1.00
Caches:
L1: 32768 bytes, linesize 64 bytes, assoc. 8, shared 1 ways
L2: 262144 bytes, linesize 64 bytes, assoc. 8, shared 1 ways
L3: 26214400 bytes, linesize 64 bytes, assoc. 20, shared 5 ways
PCI devices:
0000:05:00.0 Id: 10de:1db1 Class: 0x0302 Numa: -1
0000:06:00.0 Id: 10de:1db1 Class: 0x0302 Numa: -1
0000:00:11.4 Id: 8086:8d62 Class: 0x0106 Numa: -1
0000:07:00.0 Id: 8086:1528 Class: 0x0200 Numa: -1
0000:07:00.1 Id: 8086:1528 Class: 0x0200 Numa: -1
0000:09:00.0 Id: 1a03:2000 Class: 0x0300 Numa: -1
0000:00:1f.2 Id: 8086:8d02 Class: 0x0106 Numa: -1
0000:83:00.0 Id: 8086:24f0 Class: 0x0208 Numa: -1
0000:84:00.0 Id: 10de:1db1 Class: 0x0302 Numa: -1
0000:85:00.0 Id: 10de:1db1 Class: 0x0302 Numa: -1
GPU info:
Number of GPUs detected: 1
#0: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat: compatible
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E.
Lindahl
GROMACS: High performance molecular simulations through multi-level
parallelism from laptops to supercomputers
SoftwareX 1 (2015) pp. 19-25
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Páll, M. J. Abraham, C. Kutzner, B. Hess, E. Lindahl
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with
GROMACS
In S. Markidis & E. Laure (Eds.), Solving Software Challenges for Exascale 8759 (2015) pp. 3-27
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R.
Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E. Lindahl
GROMACS 4.5: a high-throughput and highly parallel open source molecular
simulation toolkit
Bioinformatics 29 (2013) pp. 845-54
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 435-447
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
Berendsen
GROMACS: Fast, Flexible and Free
J. Comp. Chem. 26 (2005) pp. 1701-1719
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
E. Lindahl and B. Hess and D. van der Spoel
GROMACS 3.0: A package for molecular simulation and trajectory analysis
J. Mol. Mod. 7 (2001) pp. 306-317
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp. Phys. Comm. 91 (1995) pp. 43-56
-------- -------- --- Thank You --- -------- --------
++++ PLEASE CITE THE DOI FOR THIS VERSION OF GROMACS ++++
https://doi.org/10.5281/zenodo.4420785
-------- -------- --- Thank You --- -------- --------
Input Parameters:
integrator = md
tinit = 0
dt = 0.002
nsteps = 50000000
init-step = 0
simulation-part = 1
comm-mode = None
nstcomm = 0
bd-fric = 0
ld-seed = 1069350751
emtol = 10
emstep = 0.01
niter = 20
fcstep = 0
nstcgsteep = 1000
nbfgscorr = 10
rtpi = 0.05
nstxout = 50000
nstvout = 0
nstfout = 0
nstlog = 50000
nstcalcenergy = 10000
nstenergy = 50000
nstxout-compressed = 0
compressed-x-precision = 1000
cutoff-scheme = Verlet
nstlist = 40
pbc = xyz
periodic-molecules = false
verlet-buffer-tolerance = 0.005
rlist = 0.954
coulombtype = PME
coulomb-modifier = Potential-shift
rcoulomb-switch = 0
rcoulomb = 0.9
epsilon-r = 1
epsilon-rf = inf
vdw-type = Cut-off
vdw-modifier = Potential-shift
rvdw-switch = 0
rvdw = 0.9
DispCorr = EnerPres
table-extension = 1
fourierspacing = 0.12
fourier-nx = 600
fourier-ny = 600
fourier-nz = 84
pme-order = 4
ewald-rtol = 1e-05
ewald-rtol-lj = 0.001
lj-pme-comb-rule = Geometric
ewald-geometry = 0
epsilon-surface = 0
tcoupl = V-rescale
nsttcouple = 40
nh-chain-length = 0
print-nose-hoover-chain-variables = false
pcoupl = No
pcoupltype = Isotropic
nstpcouple = -1
tau-p = 1
compressibility (3x3):
compressibility[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
compressibility[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
ref-p (3x3):
ref-p[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
ref-p[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
refcoord-scaling = No
posres-com (3):
posres-com[0]= 0.00000e+00
posres-com[1]= 0.00000e+00
posres-com[2]= 0.00000e+00
posres-comB (3):
posres-comB[0]= 0.00000e+00
posres-comB[1]= 0.00000e+00
posres-comB[2]= 0.00000e+00
QMMM = false
QMconstraints = 0
QMMMscheme = 0
MMChargeScaleFactor = 1
qm-opts:
ngQM = 0
constraint-algorithm = Lincs
continuation = false
Shake-SOR = false
shake-tol = 0.0001
lincs-order = 4
lincs-iter = 1
lincs-warnangle = 30
nwall = 0
wall-type = 9-3
wall-r-linpot = -1
wall-atomtype[0] = -1
wall-atomtype[1] = -1
wall-density[0] = 0
wall-density[1] = 0
wall-ewald-zfac = 3
pull = false
ramd = false
awh = false
rotation = false
interactiveMD = false
disre = No
disre-weighting = Conservative
disre-mixed = false
dr-fc = 1000
dr-tau = 0
nstdisreout = 100
orire-fc = 0
orire-tau = 0
nstorireout = 100
free-energy = no
cos-acceleration = 0
deform (3x3):
deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
simulated-tempering = false
swapcoords = no
userint1 = 0
userint2 = 0
userint3 = 0
userint4 = 0
userreal1 = 0
userreal2 = 0
userreal3 = 0
userreal4 = 0
applied-forces:
electric-field:
x:
E0 = 0
omega = 0
t0 = 0
sigma = 0
y:
E0 = 0
omega = 0
t0 = 0
sigma = 0
z:
E0 = 0
omega = 0
t0 = 0
sigma = 0
density-guided-simulation:
active = false
group = protein
similarity-measure = inner-product
atom-spreading-weight = unity
force-constant = 1e+09
gaussian-transform-spreading-width = 0.2
gaussian-transform-spreading-range-in-multiples-of-width = 4
reference-density-filename = reference.mrc
nst = 1
normalize-densities = true
adaptive-force-scaling = false
adaptive-force-scaling-time-constant = 4
grpopts:
nrdf: 1.34784e+06 0
ref-t: 269 0
tau-t: 0.5 0.5
annealing: No No
annealing-npoints: 0 0
acc: 0 0 0
nfreeze: Y Y Y N N N
energygrp-flags[ 0]: 0
The -nsteps functionality is deprecated, and may be removed in a future version. Consider using gmx convert-tpr -nsteps or changing the appropriate .mdp file field.
Overriding nsteps with value passed on the command line: 10000 steps, 20 ps
Changing nstlist from 40 to 100, rlist from 0.954 to 1.016
1 GPU selected for this run.
Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node:
PP:0,PME:0
PP tasks will do (non-perturbed) short-ranged interactions on the GPU
PP task will update and constrain coordinates on the CPU
PME tasks will do all aspects on the GPU
Using 1 MPI process
Using 5 OpenMP threads
Pinning threads with an auto-selected logical core stride of 1
System total charge: 0.000
Will do PME sum in reciprocal space for electrostatic interactions.
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen
A smooth particle mesh Ewald method
J. Chem. Phys. 103 (1995) pp. 8577-8592
-------- -------- --- Thank You --- -------- --------
Using a Gaussian width (1/beta) of 0.288146 nm for Ewald
Potential shift: LJ r^-12: -3.541e+00 r^-6: -1.882e+00, Ewald -1.111e-05
Initialized non-bonded Ewald tables, spacing: 8.85e-04 size: 1018
Using GPU 8x8 nonbonded short-range kernels
Using a dual 8x8 pair-list setup updated with dynamic, rolling pruning:
outer list: updated every 100 steps, buffer 0.116 nm, rlist 1.016 nm
inner list: updated every 18 steps, buffer 0.004 nm, rlist 0.904 nm
At tolerance 0.005 kJ/mol/ps per atom, equivalent classical 1x1 list would be:
outer list: updated every 100 steps, buffer 0.257 nm, rlist 1.157 nm
inner list: updated every 18 steps, buffer 0.066 nm, rlist 0.966 nm
Using geometric Lennard-Jones combination rule
Long Range LJ corr.: <C6> 2.2243e-04
Removing pbc first time
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Miyamoto and P. A. Kollman
SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
Water Models
J. Comp. Chem. 13 (1992) pp. 952-962
-------- -------- --- Thank You --- -------- --------
The -noconfout functionality is deprecated, and may be removed in a future version.
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
G. Bussi, D. Donadio and M. Parrinello
Canonical sampling through velocity rescaling
J. Chem. Phys. 126 (2007) pp. 014101
-------- -------- --- Thank You --- -------- --------
There are: 1347840 Atoms
There are: 449280 VSites
Constraining the starting coordinates (step 0)
Constraining the coordinates at t0-dt (step 0)
RMS relative constraint deviation after constraining: 0.00e+00
Initial temperature: 270.205 K
Started mdrun on rank 0 Thu Dec 23 18:18:10 2021
The -resethway functionality is deprecated, and may be removed in a future version.
Step Time
0 0.00000
Energies (kJ/mol)
LJ (SR) Disper. corr. Coulomb (SR) Coul. recip. Potential
6.78704e+06 -8.22205e+04 -3.64499e+07 3.18710e+05 -2.94264e+07
Kinetic En. Total Energy Conserved En. Temperature Pres. DC (bar)
1.51535e+06 -2.79110e+07 -2.79110e+07 2.70439e+02 -2.74455e+01
Pressure (bar)
4.78317e+03
step 600: timed with pme grid 600 600 84, coulomb cutoff 0.900: 8855.3 M-cycles
step 800: timed with pme grid 512 512 72, coulomb cutoff 1.042: 8933.4 M-cycles
step 1000: timed with pme grid 448 480 64, coulomb cutoff 1.183: 9341.4 M-cycles
step 1200: timed with pme grid 416 432 60, coulomb cutoff 1.274: 9710.0 M-cycles
step 1400: timed with pme grid 384 384 56, coulomb cutoff 1.381: 10029.1 M-cycles
step 1600: timed with pme grid 400 400 56, coulomb cutoff 1.339: 9866.4 M-cycles
step 1800: timed with pme grid 400 400 60, coulomb cutoff 1.326: 9875.9 M-cycles
step 2000: timed with pme grid 416 416 60, coulomb cutoff 1.275: 9743.4 M-cycles
step 2200: timed with pme grid 416 432 60, coulomb cutoff 1.274: 9704.1 M-cycles
step 2400: timed with pme grid 432 432 60, coulomb cutoff 1.250: 9551.5 M-cycles
step 2600: timed with pme grid 432 432 64, coulomb cutoff 1.228: 9391.8 M-cycles
step 2800: timed with pme grid 448 448 64, coulomb cutoff 1.184: 9216.8 M-cycles
step 3000: timed with pme grid 448 480 64, coulomb cutoff 1.183: 9281.3 M-cycles
step 3200: timed with pme grid 480 480 64, coulomb cutoff 1.172: 9297.0 M-cycles
step 3400: timed with pme grid 480 480 72, coulomb cutoff 1.105: 9103.3 M-cycles
step 3600: timed with pme grid 512 512 72, coulomb cutoff 1.042: 8856.1 M-cycles
step 3800: timed with pme grid 512 560 80, coulomb cutoff 1.035: 9039.6 M-cycles
step 4000: timed with pme grid 560 560 80, coulomb cutoff 0.947: 8824.9 M-cycles
step 4200: timed with pme grid 560 576 80, coulomb cutoff 0.946: 8769.2 M-cycles
step 4400: timed with pme grid 576 576 80, coulomb cutoff 0.938: 8748.5 M-cycles
step 4600: timed with pme grid 576 576 84, coulomb cutoff 0.921: 8792.6 M-cycles
step 4800: timed with pme grid 600 600 84, coulomb cutoff 0.900: 8910.0 M-cycles
optimal pme grid 576 576 80, coulomb cutoff 0.938
step 5000: resetting all time and cycle counters
Restarted time on rank 0 Thu Dec 23 18:22:23 2021
Step Time
10000 20.00000
Energies (kJ/mol)
LJ (SR) Disper. corr. Coulomb (SR) Coul. recip. Potential
6.48658e+06 -8.22205e+04 -3.43656e+07 2.32747e+05 -2.77285e+07
Kinetic En. Total Energy Conserved En. Temperature Pres. DC (bar)
1.50739e+06 -2.62211e+07 -2.78944e+07 2.69020e+02 -2.74455e+01
Pressure (bar)
5.22932e+03
<====== ############### ==>
<==== A V E R A G E S ====>
<== ############### ======>
Statistics over 10001 steps using 2 frames
Energies (kJ/mol)
LJ (SR) Disper. corr. Coulomb (SR) Coul. recip. Potential
6.63681e+06 -8.22205e+04 -3.54078e+07 2.75729e+05 -2.85774e+07
Kinetic En. Total Energy Conserved En. Temperature Pres. DC (bar)
1.51137e+06 -2.70661e+07 -2.79027e+07 2.69729e+02 -2.74455e+01
Pressure (bar)
5.00624e+03
Total Virial (kJ/mol)
-7.17627e+06 7.05301e+02 -5.85084e+03
3.45133e+03 -7.17287e+06 -2.91071e+03
-4.50451e+03 -3.46621e+03 -6.73872e+06
Pressure (bar)
5.10232e+03 -4.08236e-01 3.35906e+00
-2.23316e+00 5.10002e+03 2.16240e+00
2.46433e+00 2.53156e+00 4.81638e+03
T-z>1.48_f0_t0.000T-z<1.48_f0_t0.000
2.69729e+02 0.00000e+00
P P - P M E L O A D B A L A N C I N G
PP/PME load balancing changed the cut-off and PME settings:
particle-particle PME
rcoulomb rlist grid spacing 1/beta
initial 0.900 nm 0.904 nm 600 600 84 0.119 nm 0.288 nm
final 0.938 nm 0.942 nm 576 576 80 0.125 nm 0.300 nm
cost-ratio 1.13 0.88
(note that these numbers concern only part of the total PP and PME load)
M E G A - F L O P S A C C O U N T I N G
NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
W3=SPC/TIP3p W4=TIP4p (single or pairs)
V&F=Potential and force V=Potential only F=Force only
Computing: M-Number M-Flops % Flops
-----------------------------------------------------------------------------
Pair Search distance check 5238.684928 47148.164 0.0
NxN Ewald Elec. + LJ [F] 7301383.174400 481891289.510 99.8
NxN Ewald Elec. + LJ [V&F] 1460.833792 156309.216 0.0
Shift-X 91.653120 549.919 0.0
Virial 1.797165 32.349 0.0
Calc-Ekin 900.357120 24309.642 0.0
Constraint-V 6740.547840 53924.383 0.0
Constraint-Vir 1.347840 32.348 0.0
Settle 2246.849280 725732.317 0.2
Virtual Site 3 2247.298560 83150.047 0.0
-----------------------------------------------------------------------------
Total 482982477.895 100.0
-----------------------------------------------------------------------------
R E A L C Y C L E A N D T I M E A C C O U N T I N G
On 1 MPI rank, each using 5 OpenMP threads
Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
-----------------------------------------------------------------------------
Vsite constr. 1 5 5001 16.015 192.177 8.8
Neighbor search 1 5 51 8.254 99.046 4.5
Launch GPU ops. 1 5 5001 1.315 15.774 0.7
Force 1 5 5001 4.532 54.381 2.5
Wait PME GPU gather 1 5 5001 12.236 146.833 6.7
Reduce GPU PME F 1 5 5001 7.493 89.911 4.1
Wait GPU NB local 12.653 151.836 6.9
NB X/F buffer ops. 1 5 9951 24.680 296.161 13.5
Vsite spread 1 5 5002 17.601 211.205 9.6
Update 1 5 5001 28.461 341.533 15.6
Constraints 1 5 5001 13.784 165.408 7.5
Rest 35.640 427.669 19.5
-----------------------------------------------------------------------------
Total 182.663 2191.934 100.0
-----------------------------------------------------------------------------
Core t (s) Wall t (s) (%)
Time: 913.206 182.663 499.9
(ns/day) (hour/ns)
Performance: 4.731 5.073
Finished mdrun on rank 0 Thu Dec 23 18:25:26 2021