interestingly this segfaults with a “one or more molecules cannot be settled” but only with dual GPUs (-np 1 also runs fine)
:-) GROMACS - gmx mdrun, 2020.4 (-:
GROMACS is written by:
Emile Apol Rossen Apostolov Paul Bauer Herman J.C. Berendsen
Par Bjelkmar Christian Blau Viacheslav Bolnykh Kevin Boyd
Aldert van Buuren Rudi van Drunen Anton Feenstra Alan Gray
Gerrit Groenhof Anca Hamuraru Vincent Hindriksen M. Eric Irrgang
Aleksei Iupinov Christoph Junghans Joe Jordan Dimitrios Karkoulis
Peter Kasson Jiri Kraus Carsten Kutzner Per Larsson
Justin A. Lemkul Viveca Lindahl Magnus Lundborg Erik Marklund
Pascal Merz Pieter Meulenhoff Teemu Murtola Szilard Pall
Sander Pronk Roland Schulz Michael Shirts Alexey Shvetsov
Alfons Sijbers Peter Tieleman Jon Vincent Teemu Virolainen
Christian Wennberg Maarten Wolf Artem Zhmurov
and the project leaders:
Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2019, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.
GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.
GROMACS: gmx mdrun, version 2020.4
Executable: /SFS/user/ry/waight/tools/gromacs-2020-mpi/bin/gmx_mpi
Data prefix: /SFS/user/ry/waight/tools/gromacs-2020-mpi
Working dir: /mnt/lustre2/craycs/scratch/ABW_gromacs_scratch
Process ID: 42352
Command line:
gmx_mpi mdrun -ntomp 10 -v -deffnm md_0_1
GROMACS version: 2020.4
Verified release checksum is 79c2857291b034542c26e90512b92fd4b184a1c9d6fa59c55f2e24ccf14e7281
Precision: single
Memory model: 64 bit
MPI library: MPI
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support: CUDA
SIMD instructions: AVX_512
FFT library: fftw-3.3.8-sse2-avx-avx2-avx2_128-avx512
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
C compiler: /SFS/product/eb_tcl/GCC/7.5/software/GCCcore/7.5.0/bin/gcc GNU 7.5.0
C compiler flags: -mavx512f -mfma -pthread -fexcess-precision=fast -funroll-all-loops -O3 -DNDEBUG
C++ compiler: /SFS/product/eb_tcl/GCC/7.5/software/GCCcore/7.5.0/bin/g++ GNU 7.5.0
C++ compiler flags: -mavx512f -mfma -pthread -fexcess-precision=fast -funroll-all-loops -fopenmp -O3 -DNDEBUG
CUDA compiler: /SFS/product/cuda/10.1/centos76_x86_64/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2019 NVIDIA Corporation;Built on Sun_Jul_28_19:07:16_PDT_2019;Cuda compilation tools, release 10.1, V10.1.243
CUDA compiler flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_35,code=compute_35;-gencode;arch=compute_50,code=compute_50;-gencode;arch=compute_52,code=compute_52;-gencode;arch=compute_60,code=compute_60;-gencode;arch=compute_61,code=compute_61;-gencode;arch=compute_70,code=compute_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;;-mavx512f -mfma -pthread -fexcess-precision=fast -funroll-all-loops -fopenmp -O3 -DNDEBUG
CUDA driver: 10.10
CUDA runtime: 10.10
Running on 1 node with total 40 cores, 40 logical cores, 4 compatible GPUs
Hardware detected on host ktchpccg015 (the node of MPI rank 0):
CPU info:
Vendor: Intel
Brand: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Family: 6 Model: 85 Stepping: 4
Features: aes apic avx avx2 avx512f avx512cd avx512bw avx512vl clfsh cmov cx8 cx16 f16c fma hle htt intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
Number of AVX-512 FMA units: 2
Hardware topology: Basic
Sockets, cores, and logical processors:
Socket 0: [ 0] [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [ 10] [ 11] [ 12] [ 13] [ 14] [ 15] [ 16] [ 17] [ 18] [ 19]
Socket 1: [ 20] [ 21] [ 22] [ 23] [ 24] [ 25] [ 26] [ 27] [ 28] [ 29] [ 30] [ 31] [ 32] [ 33] [ 34] [ 35] [ 36] [ 37] [ 38] [ 39]
GPU info:
Number of GPUs detected: 8
#0: NVIDIA Tesla V100-SXM2-32GB, compute cap.: 7.0, ECC: no, stat: compatible
#1: NVIDIA Tesla V100-SXM2-32GB, compute cap.: 7.0, ECC: no, stat: unavailable
#2: NVIDIA Tesla V100-SXM2-32GB, compute cap.: 7.0, ECC: no, stat: compatible
#3: NVIDIA Tesla V100-SXM2-32GB, compute cap.: 7.0, ECC: no, stat: unavailable
#4: NVIDIA Tesla V100-SXM2-32GB, compute cap.: 7.0, ECC: no, stat: unavailable
#5: NVIDIA Tesla V100-SXM2-32GB, compute cap.: 7.0, ECC: no, stat: compatible
#6: NVIDIA Tesla V100-SXM2-32GB, compute cap.: 7.0, ECC: no, stat: unavailable
#7: NVIDIA Tesla V100-SXM2-32GB, compute cap.: 7.0, ECC: no, stat: compatible
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E.
Lindahl
GROMACS: High performance molecular simulations through multi-level
parallelism from laptops to supercomputers
SoftwareX 1 (2015) pp. 19-25
-------- -------- — Thank You — -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Páll, M. J. Abraham, C. Kutzner, B. Hess, E. Lindahl
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with
GROMACS
In S. Markidis & E. Laure (Eds.), Solving Software Challenges for Exascale 8759 (2015) pp. 3-27
-------- -------- — Thank You — -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R.
Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E. Lindahl
GROMACS 4.5: a high-throughput and highly parallel open source molecular
simulation toolkit
Bioinformatics 29 (2013) pp. 845-54
-------- -------- — Thank You — -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 435-447
-------- -------- — Thank You — -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
Berendsen
GROMACS: Fast, Flexible and Free
J. Comp. Chem. 26 (2005) pp. 1701-1719
-------- -------- — Thank You — -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
E. Lindahl and B. Hess and D. van der Spoel
GROMACS 3.0: A package for molecular simulation and trajectory analysis
J. Mol. Mod. 7 (2001) pp. 306-317
-------- -------- — Thank You — -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp. Phys. Comm. 91 (1995) pp. 43-56
-------- -------- — Thank You — -------- --------
++++ PLEASE CITE THE DOI FOR THIS VERSION OF GROMACS ++++
GROMACS 2020.4 Source code
-------- -------- — Thank You — -------- --------
Input Parameters:
integrator = md
tinit = 0
dt = 0.002
nsteps = 500000
init-step = 0
simulation-part = 1
comm-mode = Linear
nstcomm = 100
bd-fric = 0
ld-seed = -1826774160
emtol = 10
emstep = 0.01
niter = 20
fcstep = 0
nstcgsteep = 1000
nbfgscorr = 10
rtpi = 0.05
nstxout = 0
nstvout = 0
nstfout = 0
nstlog = 100000
nstcalcenergy = 100
nstenergy = 100000
nstxout-compressed = 100000
compressed-x-precision = 1000
cutoff-scheme = Verlet
nstlist = 10
pbc = xyz
periodic-molecules = false
verlet-buffer-tolerance = 0.005
rlist = 1
coulombtype = PME
coulomb-modifier = Potential-shift
rcoulomb-switch = 0
rcoulomb = 1
epsilon-r = 1
epsilon-rf = inf
vdw-type = Cut-off
vdw-modifier = Potential-shift
rvdw-switch = 0
rvdw = 1
DispCorr = EnerPres
table-extension = 1
fourierspacing = 0.16
fourier-nx = 52
fourier-ny = 52
fourier-nz = 52
pme-order = 4
ewald-rtol = 1e-05
ewald-rtol-lj = 0.001
lj-pme-comb-rule = Geometric
ewald-geometry = 0
epsilon-surface = 0
tcoupl = V-rescale
nsttcouple = 10
nh-chain-length = 0
print-nose-hoover-chain-variables = false
pcoupl = Parrinello-Rahman
pcoupltype = Isotropic
nstpcouple = 10
tau-p = 2
compressibility (3x3):
compressibility[ 0]={ 4.50000e-05, 0.00000e+00, 0.00000e+00}
compressibility[ 1]={ 0.00000e+00, 4.50000e-05, 0.00000e+00}
compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 4.50000e-05}
ref-p (3x3):
ref-p[ 0]={ 1.00000e+00, 0.00000e+00, 0.00000e+00}
ref-p[ 1]={ 0.00000e+00, 1.00000e+00, 0.00000e+00}
ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e+00}
refcoord-scaling = No
posres-com (3):
posres-com[0]= 0.00000e+00
posres-com[1]= 0.00000e+00
posres-com[2]= 0.00000e+00
posres-comB (3):
posres-comB[0]= 0.00000e+00
posres-comB[1]= 0.00000e+00
posres-comB[2]= 0.00000e+00
QMMM = false
QMconstraints = 0
QMMMscheme = 0
MMChargeScaleFactor = 1
qm-opts:
ngQM = 0
constraint-algorithm = Lincs
continuation = true
Shake-SOR = false
shake-tol = 0.0001
lincs-order = 4
lincs-iter = 1
lincs-warnangle = 30
nwall = 0
wall-type = 9-3
wall-r-linpot = -1
wall-atomtype[0] = -1
wall-atomtype[1] = -1
wall-density[0] = 0
wall-density[1] = 0
wall-ewald-zfac = 3
pull = false
awh = false
rotation = false
interactiveMD = false
disre = No
disre-weighting = Conservative
disre-mixed = false
dr-fc = 1000
dr-tau = 0
nstdisreout = 100
orire-fc = 0
orire-tau = 0
nstorireout = 100
free-energy = no
cos-acceleration = 0
deform (3x3):
deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
simulated-tempering = false
swapcoords = no
userint1 = 0
userint2 = 0
userint3 = 0
userint4 = 0
userreal1 = 0
userreal2 = 0
userreal3 = 0
userreal4 = 0
applied-forces:
electric-field:
x:
E0 = 0
omega = 0
t0 = 0
sigma = 0
y:
E0 = 0
omega = 0
t0 = 0
sigma = 0
z:
E0 = 0
omega = 0
t0 = 0
sigma = 0
density-guided-simulation:
active = false
group = protein
similarity-measure = inner-product
atom-spreading-weight = unity
force-constant = 1e+09
gaussian-transform-spreading-width = 0.2
gaussian-transform-spreading-range-in-multiples-of-width = 4
reference-density-filename = reference.mrc
nst = 1
normalize-densities = true
adaptive-force-scaling = false
adaptive-force-scaling-time-constant = 4
grpopts:
nrdf: 8718.77 106107
ref-t: 300 300
tau-t: 0.1 0.1
annealing: No No
annealing-npoints: 0 0
acc: 0 0 0
nfreeze: N N N
energygrp-flags[ 0]: 0
Changing nstlist from 10 to 100, rlist from 1 to 1.157
Initializing Domain Decomposition on 2 ranks
Dynamic load balancing: auto
Using update groups, nr 19456, average size 2.9 atoms, max. radius 0.104 nm
Minimum cell size due to atom displacement: 0.652 nm
Initial maximum distances in bonded interactions:
two-body bonded interactions: 0.450 nm, LJ-14, atoms 1973 3103
multi-body bonded interactions: 0.450 nm, Proper Dih., atoms 3103 1973
Minimum cell size due to bonded interactions: 0.495 nm
Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
Using 0 separate PME ranks
Optimizing the DD grid for 2 cells with a minimum initial size of 0.815 nm
The maximum allowed number of cells is: X 10 Y 10 Z 10
Domain decomposition grid 2 x 1 x 1, separate PME ranks 0
PME domain decomposition: 2 x 1 x 1
Domain decomposition rank 0, coordinates 0 0 0
The initial number of communication pulses is: X 1
The initial domain decomposition cell size is: X 4.14 nm
The maximum allowed distance for atom groups involved in interactions is:
non-bonded interactions 1.366 nm
(the following are initial values, they could change due to box deformation)
two-body bonded interactions (-rdd) 1.366 nm
multi-body bonded interactions (-rdd) 1.366 nm
When dynamic load balancing gets turned on, these settings will change to:
The maximum number of communication pulses is: X 1
The minimum size for domain decomposition cells is 1.366 nm
The requested allowed shrink of DD cells (option -dds) is: 0.80
The allowed shrink of domain decomposition cells is: X 0.33
The maximum allowed distance for atom groups involved in interactions is:
non-bonded interactions 1.366 nm
two-body bonded interactions (-rdd) 1.366 nm
multi-body bonded interactions (-rdd) 1.366 nm
On host ktchpccg015 2 GPUs selected for this run.
Mapping of GPU IDs to the 2 GPU tasks in the 2 ranks on this node:
PP:0,PP:2
PP tasks will do (non-perturbed) short-ranged and most bonded interactions on the GPU
PP task will update and constrain coordinates on the CPU
Using 2 MPI processes
Non-default thread affinity set, disabling internal thread affinity
Using 10 OpenMP threads per MPI process
NOTE: Your choice of number of MPI ranks and amount of resources results in using 10 OpenMP threads per rank, which is most likely inefficient. The optimum is usually between 2 and 6 threads per rank.
System total charge: -0.000
Will do PME sum in reciprocal space for electrostatic interactions.
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen
A smooth particle mesh Ewald method
J. Chem. Phys. 103 (1995) pp. 8577-8592
-------- -------- — Thank You — -------- --------
Using a Gaussian width (1/beta) of 0.320163 nm for Ewald
Potential shift: LJ r^-12: -1.000e+00 r^-6: -1.000e+00, Ewald -1.000e-05
Initialized non-bonded Ewald tables, spacing: 9.33e-04 size: 1073
Generated table with 1078 data points for 1-4 COUL.
Tabscale = 500 points/nm
Generated table with 1078 data points for 1-4 LJ6.
Tabscale = 500 points/nm
Generated table with 1078 data points for 1-4 LJ12.
Tabscale = 500 points/nm
Using GPU 8x8 nonbonded short-range kernels
Using a dual 8x8 pair-list setup updated with dynamic, rolling pruning:
outer list: updated every 100 steps, buffer 0.157 nm, rlist 1.157 nm
inner list: updated every 10 steps, buffer 0.001 nm, rlist 1.001 nm
At tolerance 0.005 kJ/mol/ps per atom, equivalent classical 1x1 list would be:
outer list: updated every 100 steps, buffer 0.306 nm, rlist 1.306 nm
inner list: updated every 10 steps, buffer 0.042 nm, rlist 1.042 nm
Using Lorentz-Berthelot Lennard-Jones combination rule
Long Range LJ corr.: 3.0483e-04
Initializing LINear Constraint Solver
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and H. Bekker and H. J. C. Berendsen and J. G. E. M. Fraaije
LINCS: A Linear Constraint Solver for molecular simulations
J. Comp. Chem. 18 (1997) pp. 1463-1472
-------- -------- — Thank You — -------- --------
The number of constraints is 1706
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Miyamoto and P. A. Kollman
SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
Water Models
J. Comp. Chem. 13 (1992) pp. 952-962
-------- -------- — Thank You — -------- --------
Linking all bonded interactions to atoms
Intra-simulation communication will occur every 10 steps.
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
G. Bussi, D. Donadio and M. Parrinello
Canonical sampling through velocity rescaling
J. Chem. Phys. 126 (2007) pp. 014101
-------- -------- — Thank You — -------- --------
There are: 56528 Atoms
Atom distribution over 2 domains: av 28264 stddev 67 min 28197 max 28331
NOTE: DLB will not turn on during the first phase of PME tuning
Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
0: rest
Started mdrun on rank 0 Sat Nov 7 03:20:48 2020
Step Time
0 0.00000
Energies (kJ/mol)
Bond Angle Proper Dih. Improper Dih. LJ-14
2.77117e+03 7.36667e+03 4.10178e+03 3.94997e+02 3.95671e+03
Coulomb-14 LJ (SR) Disper. corr. Coulomb (SR) Coul. recip.
4.58217e+04 1.02373e+05 -7.19928e+03 -9.29511e+05 5.22782e+03
Potential Kinetic En. Total Energy Conserved En. Temperature
-7.64697e+05 2.49244e+09 2.49167e+09 2.49167e+09 5.22132e+06
Pres. DC (bar) Pressure (bar) Constr. rmsd
-2.11444e+02 -4.98122e+07 2.93781e-06