Gromacs Drude

Hello there,

I’m increasing the size of my system to exploit muticore performances as for my initial simulation I could only use one core. However, I have the usual segmentation fault during energy minimization. Here the .log file

Log file opened on Wed Dec 2 09:00:03 2020
Host: b-an01.hpc2n.umu.se pid: 733089 rank ID: 0 number of ranks: 1
:-) GROMACS - gmx mdrun, 2016-dev (-:

                        GROMACS is written by:
 Emile Apol      Rossen Apostolov  Herman J.C. Berendsen    Par Bjelkmar   

Aldert van Buuren Rudi van Drunen Anton Feenstra Gerrit Groenhof
Christoph Junghans Anca Hamuraru Vincent Hindriksen Dimitrios Karkoulis
Peter Kasson Jiri Kraus Carsten Kutzner Per Larsson
Justin A. Lemkul Magnus Lundborg Pieter Meulenhoff Erik Marklund
Teemu Murtola Szilard Pall Sander Pronk Roland Schulz
Alexey Shvetsov Michael Shirts Alfons Sijbers Peter Tieleman
Teemu Virolainen Christian Wennberg Maarten Wolf
and the project leaders:
Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel

Copyright © 1991-2000, University of Groningen, The Netherlands.
Copyright © 2001-2015, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.

GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.

GROMACS: gmx mdrun, version 2016-dev
Executable: /hpc2n/eb/software/MPI/GCC/6.4.0-2.28/impi/2017.3.196/GROMACS/2016.x-drude-20180214-g3f7439a/bin/gmx
Data prefix: /hpc2n/eb/software/MPI/GCC/6.4.0-2.28/impi/2017.3.196/GROMACS/2016.x-drude-20180214-g3f7439a
Command line:
gmx mdrun -deffnm em -v

GROMACS version: 2016-dev
Precision: single
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 32)
GPU support: disabled
SIMD instructions: AVX2_256
FFT library: fftw-3.3.6-pl2-fma-sse2-avx-avx2-avx2_128
RDTSCP usage: enabled
TNG support: enabled
Tracing support: disabled
Built on: Thu Sep 19 18:47:11 UTC 2019
Built by: easybuild@b-cn0823.hpc2n.umu.se [CMAKE]
Build OS/arch: Linux 4.15.0-62-generic x86_64
Build CPU vendor: Intel
Build CPU brand: Intel® Xeon® CPU E5-2690 v4 @ 2.60GHz
Build CPU family: 6 Model: 79 Stepping: 1
Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma hle htt lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
C compiler: /hpc2n/eb/software/Compiler/GCC/6.4.0-2.28/impi/2017.3.196/bin64/mpicc GNU 6.4.0
C compiler flags: -march=core-avx2 -O2 -ftree-vectorize -march=native -fno-math-errno -fopenmp -mt_mpi -Wundef -Wextra -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -Wall -Wno-unused -Wunused-value -Wunused-parameter -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast -Wno-array-bounds
C++ compiler: /hpc2n/eb/software/Compiler/GCC/6.4.0-2.28/impi/2017.3.196/bin64/mpicxx GNU 6.4.0
C++ compiler flags: -march=core-avx2 -O2 -ftree-vectorize -march=native -fno-math-errno -fopenmp -mt_mpi -std=c++0x -Wundef -Wextra -Wno-missing-field-initializers -Wpointer-arith -Wall -Wno-unused-function -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast -Wno-array-bounds

Running on 1 node with total 28 cores, 28 logical cores
Hardware detected:
CPU info:
Vendor: Intel
Brand: Intel® Xeon® CPU E5-2690 v4 @ 2.60GHz
Family: 6 Model: 79 Stepping: 1
Features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma hle htt lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
SIMD instructions most likely to fit this hardware: AVX2_256
SIMD instructions selected at GROMACS compile time: AVX2_256

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E.
Lindahl
GROMACS: High performance molecular simulations through multi-level
parallelism from laptops to supercomputers
SoftwareX 1 (2015) pp. 19-25
-------- -------- — Thank You — -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Páll, M. J. Abraham, C. Kutzner, B. Hess, E. Lindahl
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with
GROMACS
In S. Markidis & E. Laure (Eds.), Solving Software Challenges for Exascale 8759 (2015) pp. 3-27
-------- -------- — Thank You — -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R.
Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E. Lindahl
GROMACS 4.5: a high-throughput and highly parallel open source molecular
simulation toolkit
Bioinformatics 29 (2013) pp. 845-54
-------- -------- — Thank You — -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 435-447
-------- -------- — Thank You — -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
Berendsen
GROMACS: Fast, Flexible and Free
J. Comp. Chem. 26 (2005) pp. 1701-1719
-------- -------- — Thank You — -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
E. Lindahl and B. Hess and D. van der Spoel
GROMACS 3.0: A package for molecular simulation and trajectory analysis
J. Mol. Mod. 7 (2001) pp. 306-317
-------- -------- — Thank You — -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp. Phys. Comm. 91 (1995) pp. 43-56
-------- -------- — Thank You — -------- --------

Changing rlist from 1.26 to 1.2 for non-bonded 4x4 atom kernels

Input Parameters:
integrator = steep
tinit = 0
dt = 0.001
nsteps = -1
init-step = 0
simulation-part = 1
comm-mode = Linear
nstcomm = 100
bd-fric = 0
ld-seed = 3073646060
emtol = 1000
emstep = 0.01
niter = 20
fcstep = 0
nstcgsteep = 1000
nbfgscorr = 10
rtpi = 0.05
nstxout = 1
nstvout = 0
nstfout = 0
nstlog = 100
nstcalcenergy = 1
nstenergy = 1
nstxout-compressed = 0
compressed-x-precision = 1000
cutoff-scheme = Verlet
nstlist = 10
ns-type = Grid
pbc = xyz
periodic-molecules = false
verlet-buffer-tolerance = 0.005
rlist = 1.2
coulombtype = PME
coulomb-modifier = Potential-shift
rcoulomb-switch = 0
rcoulomb = 1.2
epsilon-r = 1
epsilon-rf = inf
vdw-type = Cut-off
vdw-modifier = Potential-switch
rvdw-switch = 1
rvdw = 1.2
DispCorr = EnerPres
table-extension = 1
fourierspacing = 0.12
fourier-nx = 192
fourier-ny = 192
fourier-nz = 192
pme-order = 4
ewald-rtol = 1e-05
ewald-rtol-lj = 0.001
lj-pme-comb-rule = Geometric
ewald-geometry = 0
epsilon-surface = 0
implicit-solvent = No
gb-algorithm = Still
nstgbradii = 1
rgbradii = 1
gb-epsilon-solvent = 80
gb-saltconc = 0
gb-obc-alpha = 1
gb-obc-beta = 0.8
gb-obc-gamma = 4.85
gb-dielectric-offset = 0.009
sa-algorithm = Ace-approximation
sa-surface-tension = 2.05016
tcoupl = No
nsttcouple = -1
nh-chain-length = 0
print-nose-hoover-chain-variables = false
pcoupl = No
pcoupltype = Isotropic
nstpcouple = -1
tau-p = 1
compressibility (3x3):
compressibility[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
compressibility[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
ref-p (3x3):
ref-p[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
ref-p[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
refcoord-scaling = No
posres-com (3):
posres-com[0]= 0.00000e+00
posres-com[1]= 0.00000e+00
posres-com[2]= 0.00000e+00
posres-comB (3):
posres-comB[0]= 0.00000e+00
posres-comB[1]= 0.00000e+00
posres-comB[2]= 0.00000e+00
QMMM = false
QMconstraints = 0
QMMMscheme = 0
MMChargeScaleFactor = 1
qm-opts:
ngQM = 0
constraint-algorithm = Lincs
continuation = false
Shake-SOR = false
shake-tol = 0.0001
lincs-order = 4
lincs-iter = 1
lincs-warnangle = 30
nwall = 0
wall-type = 9-3
wall-r-linpot = -1
wall-atomtype[0] = -1
wall-atomtype[1] = -1
wall-density[0] = 0
wall-density[1] = 0
wall-ewald-zfac = 3
pull = false
rotation = false
interactiveMD = false
disre = No
disre-weighting = Conservative
disre-mixed = false
dr-fc = 1000
dr-tau = 0
nstdisreout = 100
orire-fc = 0
orire-tau = 0
nstorireout = 100
free-energy = no
cos-acceleration = 0
deform (3x3):
deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
simulated-tempering = false
E-x:
n = 0
E-xt:
n = 0
E-y:
n = 0
E-yt:
n = 0
E-z:
n = 0
E-zt:
n = 0
swapcoords = no
drude = true
drudemode = scf
drude-t = 1
drude-hardwall = false
drude-r = 0.02
drude-hyper = false
drude-khyp = 1.6736e+07
drude-hyp-power = 4
nbtholecut = 0
drude-tsteps = 20
userint1 = 0
userint2 = 0
userint3 = 0
userint4 = 0
userreal1 = 0
userreal2 = 0
userreal3 = 0
userreal4 = 0
grpopts:
nrdf: 2.607e+06
ref-t: 0
tau-t: 0
annealing: No
annealing-npoints: 0
acc: 0 0 0
nfreeze: N N N
energygrp-flags[ 0]: 0

Initializing Domain Decomposition on 28 ranks
Dynamic load balancing: off
Initial maximum inter charge-group distances:
two-body bonded interactions: 0.412 nm, LJ-14, atoms 89570 89582
multi-body bonded interactions: 0.412 nm, Proper Dih., atoms 89570 89582
Minimum cell size due to bonded interactions: 0.453 nm
Guess for relative PME load: 0.09
Will use 24 particle-particle and 4 PME only ranks
This is a guess, check the performance at the end of the log file
Using 4 separate PME ranks, as guessed by mdrun
Optimizing the DD grid for 24 cells with a minimum initial size of 0.453 nm
The maximum allowed number of cells is: X 44 Y 44 Z 44
Domain decomposition grid 4 x 3 x 2, separate PME ranks 4
PME domain decomposition: 4 x 1 x 1
Interleaving PP and PME ranks
This rank does only particle-particle work.

Domain decomposition rank 0, coordinates 0 0 0

The initial number of communication pulses is: X 1 Y 1 Z 1
The initial domain decomposition cell size is: X 5.06 nm Y 6.75 nm Z 10.13 nm

The maximum allowed distance for charge groups involved in interactions is:
non-bonded interactions 1.200 nm
two-body bonded interactions (-rdd) 1.200 nm
multi-body bonded interactions (-rdd) 1.200 nm

Using 28 MPI threads
Using 1 OpenMP thread per tMPI thread

Will do PME sum in reciprocal space for electrostatic interactions.

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen
A smooth particle mesh Ewald method
J. Chem. Phys. 103 (1995) pp. 8577-8592
-------- -------- — Thank You — -------- --------

Will do ordinary reciprocal space Ewald sum.
Using a Gaussian width (1/beta) of 0.384195 nm for Ewald
Cut-off’s: NS: 1.2 Coulomb: 1.2 LJ: 1.2
Long Range LJ corr.: 2.9276e-04
System total charge: -0.034
Generated table with 1100 data points for Ewald.
Tabscale = 500 points/nm
Generated table with 1100 data points for LJ6Switch.
Tabscale = 500 points/nm
Generated table with 1100 data points for LJ12Switch.
Tabscale = 500 points/nm
Generated table with 1100 data points for 1-4 COUL.
Tabscale = 500 points/nm
Generated table with 1100 data points for 1-4 LJ6.
Tabscale = 500 points/nm
Generated table with 1100 data points for 1-4 LJ12.
Tabscale = 500 points/nm
Potential shift: LJ r^-12: 0.000e+00 r^-6: 0.000e+00, Ewald -1.000e-05
Initialized non-bonded Ewald correction tables, spacing: 1.02e-03 size: 1176

Using SIMD 4x8 non-bonded kernels

Removing pbc first time
Pinning threads with an auto-selected logical core stride of 1

Linking all bonded interactions to atoms
There are 276500 shells in the system,
will do an extra communication step for selected coordinates and forces

Initiating Steepest Descents
Atom distribution over 24 domains: av 47729 stddev 188 min 47472 max 48018
Started Steepest Descents on rank 0 Wed Dec 2 09:00:07 2020

Steepest Descents:
Tolerance (Fmax) = 1.00000e+03
Number of steps = -1
Step Time
0 0.00000

Energies (kJ/mol)
Bond U-B Proper Dih. LJ-14 Coulomb-14
2.69888e+05 1.89001e+06 7.80833e+05 8.45283e+04 -3.65185e+06
LJ (SR) Disper. corr. Coulomb (SR) Coul. recip. Drude Bond
-1.45270e+06 -8.49501e+04 5.42092e+05 1.32971e+04 2.14485e+09
Aniso. Polariz. Thole Pol. Potential Pres. DC (bar) Pressure (bar)
0.00000e+00 -3.11925e+04 2.14321e+09 0.00000e+00 -2.66510e+02

DD step 0 load imb.: force 3.9% pme mesh/force 0.937

       Step           Time
          1        1.00000

Energies (kJ/mol)
Bond U-B Proper Dih. LJ-14 Coulomb-14
2.70014e+05 1.88976e+06 7.80830e+05 8.45242e+04 -3.65202e+06
LJ (SR) Disper. corr. Coulomb (SR) Coul. recip. Drude Bond
-1.45270e+06 -8.49501e+04 5.42039e+05 1.33733e+04 2.14062e+09
Aniso. Polariz. Thole Pol. Potential Pres. DC (bar) Pressure (bar)
0.00000e+00 -3.12481e+04 2.13898e+09 0.00000e+00 -2.66510e+02

The Drude implementation does not currently support domain decomposition, only OpenMP parallelization and single-domain GPU. The DD code relevant to Drude is being rewritten.