:-) GROMACS - gmx mdrun, 2023.1 (-: Copyright 1991-2023 The GROMACS Authors. GROMACS is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. Current GROMACS contributors: Mark Abraham Andrey Alekseenko Cathrine Bergh Christian Blau Eliane Briand Mahesh Doijade Stefan Fleischmann Vytas Gapsys Gaurav Garg Sergey Gorelov Gilles Gouaillardet Alan Gray M. Eric Irrgang Farzaneh Jalalypour Joe Jordan Christoph Junghans Prashanth Kanduri Sebastian Keller Carsten Kutzner Justin A. Lemkul Magnus Lundborg Pascal Merz Vedran Miletic Dmitry Morozov Szilard Pall Roland Schulz Michael Shirts Alexey Shvetsov Balint Soproni David van der Spoel Philip Turner Carsten Uphoff Alessandra Villa Sebastian Wingbermuehle Artem Zhmurov Previous GROMACS contributors: Emile Apol Rossen Apostolov James Barnett Herman J.C. Berendsen Par Bjelkmar Viacheslav Bolnykh Kevin Boyd Aldert van Buuren Carlo Camilloni Rudi van Drunen Anton Feenstra Oliver Fleetwood Gerrit Groenhof Bert de Groot Anca Hamuraru Vincent Hindriksen Victor Holanda Aleksei Iupinov Dimitrios Karkoulis Peter Kasson Sebastian Kehl Jiri Kraus Per Larsson Viveca Lindahl Erik Marklund Pieter Meulenhoff Teemu Murtola Sander Pronk Alfons Sijbers Peter Tieleman Jon Vincent Teemu Virolainen Christian Wennberg Maarten Wolf Coordinated by the GROMACS project leaders: Paul Bauer, Berk Hess, and Erik Lindahl GROMACS: gmx mdrun, version 2023.1 Executable: /home/x09527a/apps/gromacs/2023.1-debug-build1/bin/gmx_mpi Data prefix: /home/x09527a/apps/gromacs/2023.1-debug-build1 Working dir: ************************************************************ Process ID: 92 Command line: gmx_mpi mdrun -ntomp 10 -v -deffnm step6.9_equilibration -npme 1 -pme gpu -update gpu -nb gpu -bonded gpu -resethway GROMACS version: 2023.1 Precision: mixed Memory model: 64 bit MPI library: MPI (GPU-aware: CUDA) OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 128) GPU support: CUDA NB cluster size: 8 SIMD instructions: AVX2_256 CPU FFT library: fftw-3.3.8-sse2-avx-avx2-avx2_128 GPU FFT library: cuFFT Multi-GPU FFT: none RDTSCP usage: enabled TNG support: enabled Hwloc support: hwloc-1.11.8 Tracing support: disabled C compiler: /home/x09527a/apps/openmpi4.1.5-gcc11.3.0-cuda11.8.0-ucx1.14.1/bin/mpicc GNU 11.3.0 C compiler flags: -fexcess-precision=fast -funroll-all-loops -mavx2 -mfma -Wno-missing-field-initializers -O3 -DNDEBUG C++ compiler: /home/x09527a/apps/openmpi4.1.5-gcc11.3.0-cuda11.8.0-ucx1.14.1/bin/mpicxx GNU 11.3.0 C++ compiler flags: -fexcess-precision=fast -funroll-all-loops -mavx2 -mfma -Wno-missing-field-initializers -Wno-cast-function-type-strict -fopenmp -O3 -DNDEBUG BLAS library: External - detected on the system LAPACK library: External - detected on the system CUDA compiler: /home/x09527a/apps/cuda11.8.0-gcc11.3.0/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2022 NVIDIA Corporation;Built on Wed_Sep_21_10:33:58_PDT_2022;Cuda compilation tools, release 11.8, V11.8.89;Build cuda_11.8.r11.8/compiler.31833905_0 CUDA compiler flags:-std=c++17;--generate-code=arch=compute_35,code=sm_35;--generate-code=arch=compute_37,code=sm_37;--generate-code=arch=compute_50,code=sm_50;--generate-code=arch=compute_52,code=sm_52;--generate-code=arch=compute_60,code=sm_60;--generate-code=arch=compute_61,code=sm_61;--generate-code=arch=compute_70,code=sm_70;--generate-code=arch=compute_75,code=sm_75;--generate-code=arch=compute_80,code=sm_80;--generate-code=arch=compute_86,code=sm_86;--generate-code=arch=compute_89,code=sm_89;--generate-code=arch=compute_90,code=sm_90;-Wno-deprecated-gpu-targets;--generate-code=arch=compute_53,code=sm_53;--generate-code=arch=compute_80,code=sm_80;-use_fast_math;-Xptxas;-warn-double-usage;-Xptxas;-Werror;;-fexcess-precision=fast -funroll-all-loops -mavx2 -mfma -Wno-missing-field-initializers -Wno-cast-function-type-strict -fopenmp -O3 -DNDEBUG CUDA driver: 12.0 CUDA runtime: 11.80 Running on 2 nodes with total 80 cores, 80 processing units, 8 compatible GPUs Cores per node: 40 Logical processing units per node: 40 OS CPU Limit / recommended threads to start per node: 40 Compatible GPUs per node: 4 All nodes have identical type(s) of GPUs Hardware detected on host cx091 (the node of MPI rank 0): CPU info: Vendor: Intel Brand: Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz Family: 6 Model: 85 Stepping: 7 Features: aes apic avx avx2 avx512f avx512cd avx512bw avx512vl avx512secondFMA clfsh cmov cx8 cx16 f16c fma hle htt intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic Number of AVX-512 FMA units: 2 Hardware topology: Full, with devices Packages, cores, and logical processors: [indices refer to OS logical processors] Package 0: [ 0] [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [ 10] [ 11] [ 12] [ 13] [ 14] [ 15] [ 16] [ 17] [ 18] [ 19] Package 1: [ 20] [ 21] [ 22] [ 23] [ 24] [ 25] [ 26] [ 27] [ 28] [ 29] [ 30] [ 31] [ 32] [ 33] [ 34] [ 35] [ 36] [ 37] [ 38] [ 39] CPU limit set by OS: -1 Recommended max number of threads: 40 Numa nodes: Node 0 (204792705024 bytes mem): 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Node 1 (206158430208 bytes mem): 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 Latency: 0 1 0 1.00 1.80 1 1.80 1.00 Caches: L1: 32768 bytes, linesize 64 bytes, assoc. 8, shared 1 ways L2: 1048576 bytes, linesize 64 bytes, assoc. 16, shared 1 ways L3: 28835840 bytes, linesize 64 bytes, assoc. 11, shared 20 ways PCI devices: 0000:00:11.5 Id: 8086:a1d2 Class: 0x0106 Numa: 0 0000:00:17.0 Id: 8086:a182 Class: 0x0106 Numa: 0 0000:02:00.0 Id: 102b:0522 Class: 0x0300 Numa: 0 0000:02:01.0 Id: 1734:1228 Class: 0x0b40 Numa: 0 0000:03:00.0 Id: 8086:1533 Class: 0x0200 Numa: 0 0000:18:00.0 Id: 15b3:1017 Class: 0x0207 Numa: 0 0000:3d:00.0 Id: 10de:1db5 Class: 0x0302 Numa: 0 0000:3e:00.0 Id: 10de:1db5 Class: 0x0302 Numa: 0 0000:86:00.0 Id: 15b3:1017 Class: 0x0207 Numa: 1 0000:b1:00.0 Id: 10de:1db5 Class: 0x0302 Numa: 1 0000:b2:00.0 Id: 10de:1db5 Class: 0x0302 Numa: 1 0000:d8:00.0 Id: 144d:a822 Class: 0x0108 Numa: 1 GPU info: Number of GPUs detected: 4 #0: NVIDIA Tesla V100-SXM2-32GB, compute cap.: 7.0, ECC: yes, stat: compatible #1: NVIDIA Tesla V100-SXM2-32GB, compute cap.: 7.0, ECC: yes, stat: compatible #2: NVIDIA Tesla V100-SXM2-32GB, compute cap.: 7.0, ECC: yes, stat: compatible #3: NVIDIA Tesla V100-SXM2-32GB, compute cap.: 7.0, ECC: yes, stat: compatible Highest SIMD level supported by all nodes in run: AVX_512 SIMD instructions selected at compile time: AVX2_256 This program was compiled for different hardware than you are running on, which could influence performance. This build might have been configured on a login node with only a single AVX-512 FMA unit (in which case AVX2 is faster), while the node you are running on has dual AVX-512 FMA units. ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E. Lindahl GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers SoftwareX 1 (2015) pp. 19-25 -------- -------- --- Thank You --- -------- -------- ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ S. Páll, M. J. Abraham, C. Kutzner, B. Hess, E. Lindahl Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS In S. Markidis & E. Laure (Eds.), Solving Software Challenges for Exascale 8759 (2015) pp. 3-27 -------- -------- --- Thank You --- -------- -------- ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R. Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E. Lindahl GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit Bioinformatics 29 (2013) pp. 845-54 -------- -------- --- Thank You --- -------- -------- ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable molecular simulation J. Chem. Theory Comput. 4 (2008) pp. 435-447 -------- -------- --- Thank You --- -------- -------- ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C. Berendsen GROMACS: Fast, Flexible and Free J. Comp. Chem. 26 (2005) pp. 1701-1719 -------- -------- --- Thank You --- -------- -------- ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ E. Lindahl and B. Hess and D. van der Spoel GROMACS 3.0: A package for molecular simulation and trajectory analysis J. Mol. Mod. 7 (2001) pp. 306-317 -------- -------- --- Thank You --- -------- -------- ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ H. J. C. Berendsen, D. van der Spoel and R. van Drunen GROMACS: A message-passing parallel molecular dynamics implementation Comp. Phys. Comm. 91 (1995) pp. 43-56 -------- -------- --- Thank You --- -------- -------- ++++ PLEASE CITE THE DOI FOR THIS VERSION OF GROMACS ++++ https://doi.org/10.5281/zenodo.7852175 -------- -------- --- Thank You --- -------- -------- GMX_ENABLE_DIRECT_GPU_COMM environment variable detected, enabling direct GPU communication using GPU-aware MPI. Input Parameters: integrator = md tinit = 0 dt = 0.002 nsteps = 50000 init-step = 0 simulation-part = 1 mts = false comm-mode = Linear nstcomm = 100 bd-fric = 0 ld-seed = -25174273 emtol = 10 emstep = 0.01 niter = 20 fcstep = 0 nstcgsteep = 1000 nbfgscorr = 10 rtpi = 0.05 nstxout = 0 nstvout = 60000 nstfout = 60000 nstlog = 1000 nstcalcenergy = 100 nstenergy = 1000 nstxout-compressed = 60000 compressed-x-precision = 1000 cutoff-scheme = Verlet nstlist = 20 pbc = xyz periodic-molecules = false verlet-buffer-tolerance = 0.005 rlist = 1.212 coulombtype = PME coulomb-modifier = Potential-shift rcoulomb-switch = 0 rcoulomb = 1.2 epsilon-r = 1 epsilon-rf = inf vdw-type = Cut-off vdw-modifier = Force-switch rvdw-switch = 1 rvdw = 1.2 DispCorr = No table-extension = 1 fourierspacing = 0.12 fourier-nx = 108 fourier-ny = 108 fourier-nz = 120 pme-order = 4 ewald-rtol = 1e-05 ewald-rtol-lj = 0.001 lj-pme-comb-rule = Geometric ewald-geometry = 3d epsilon-surface = 0 ensemble-temperature-setting = constant ensemble-temperature = 310.15 tcoupl = V-rescale nsttcouple = 100 nh-chain-length = 0 print-nose-hoover-chain-variables = false pcoupl = C-rescale pcoupltype = Semiisotropic nstpcouple = 100 tau-p = 5 compressibility (3x3): compressibility[ 0]={ 4.50000e-05, 0.00000e+00, 0.00000e+00} compressibility[ 1]={ 0.00000e+00, 4.50000e-05, 0.00000e+00} compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 4.50000e-05} ref-p (3x3): ref-p[ 0]={ 1.00000e+00, 0.00000e+00, 0.00000e+00} ref-p[ 1]={ 0.00000e+00, 1.00000e+00, 0.00000e+00} ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e+00} refcoord-scaling = COM posres-com (3): posres-com[0]= 0.00000e+00 posres-com[1]= 0.00000e+00 posres-com[2]= 0.00000e+00 posres-comB (3): posres-comB[0]= 0.00000e+00 posres-comB[1]= 0.00000e+00 posres-comB[2]= 0.00000e+00 QMMM = false qm-opts: ngQM = 0 constraint-algorithm = Lincs continuation = true Shake-SOR = false shake-tol = 0.0001 lincs-order = 4 lincs-iter = 1 lincs-warnangle = 30 nwall = 0 wall-type = 9-3 wall-r-linpot = -1 wall-atomtype[0] = -1 wall-atomtype[1] = -1 wall-density[0] = 0 wall-density[1] = 0 wall-ewald-zfac = 3 pull = false awh = false rotation = false interactiveMD = false disre = No disre-weighting = Conservative disre-mixed = false dr-fc = 1000 dr-tau = 0 nstdisreout = 100 orire-fc = 0 orire-tau = 0 nstorireout = 100 free-energy = no cos-acceleration = 0 deform (3x3): deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} simulated-tempering = false swapcoords = no userint1 = 0 userint2 = 0 userint3 = 0 userint4 = 0 userreal1 = 0 userreal2 = 0 userreal3 = 0 userreal4 = 0 applied-forces: electric-field: x: E0 = 0 omega = 0 t0 = 0 sigma = 0 y: E0 = 0 omega = 0 t0 = 0 sigma = 0 z: E0 = 0 omega = 0 t0 = 0 sigma = 0 density-guided-simulation: active = false group = protein similarity-measure = inner-product atom-spreading-weight = unity force-constant = 1e+09 gaussian-transform-spreading-width = 0.2 gaussian-transform-spreading-range-in-multiples-of-width = 4 reference-density-filename = reference.mrc nst = 1 normalize-densities = true adaptive-force-scaling = false adaptive-force-scaling-time-constant = 4 shift-vector = transformation-matrix = qmmm-cp2k: active = false qmgroup = System qmmethod = PBE qmfilenames = qmcharge = 0 qmmultiplicity = 1 grpopts: nrdf: 21845.6 147197 327906 ref-t: 310.15 310.15 310.15 tau-t: 1 1 1 annealing: No No No annealing-npoints: 0 0 0 acc: 0 0 0 nfreeze: N N N energygrp-flags[ 0]: 0 Changing nstlist from 20 to 100, rlist from 1.212 to 1.329 Initializing Domain Decomposition on 8 ranks Dynamic load balancing: auto Using update groups, nr 83057, average size 2.8 atoms, max. radius 0.139 nm Minimum cell size due to atom displacement: 0.710 nm Initial maximum distances in bonded interactions: two-body bonded interactions: 0.446 nm, LJ-14, atoms 7986 8022 multi-body bonded interactions: 0.496 nm, CMAP Dih., atoms 1651 1663 Minimum cell size due to bonded interactions: 0.546 nm Disabling dynamic load balancing; unsupported with GPU communication + update. To account for pressure scaling, scaling the initial minimum size with 1.05 Using 1 separate PME ranks, as requested with -npme option Optimizing the DD grid for 7 cells with a minimum initial size of 0.746 nm The maximum allowed number of cells is: X 17 Y 17 Z 18 Domain decomposition grid 1 x 1 x 7, separate PME ranks 1 PME domain decomposition: 1 x 1 x 1 Interleaving PP and PME ranks This rank does only particle-particle work. Domain decomposition rank 0, coordinates 0 0 0 The initial number of communication pulses is: Z 1 The initial domain decomposition cell size is: Z 2.01 nm The maximum allowed distance for atom groups involved in interactions is: non-bonded interactions 1.607 nm (the following are initial values, they could change due to box deformation) two-body bonded interactions (-rdd) 1.607 nm multi-body bonded interactions (-rdd) 1.607 nm On host cx091 4 GPUs selected for this run. Mapping of GPU IDs to the 4 GPU tasks in the 4 ranks on this node: PP:0,PP:1,PP:2,PP:3 PP tasks will do (non-perturbed) short-ranged and most bonded interactions on the GPU PP task will update and constrain coordinates on the GPU PME tasks will do all aspects on the GPU GPU direct communication will be used between MPI ranks. Using two step summing over 2 groups of on average 3.5 ranks Using 8 MPI processes Non-default thread affinity set, disabling internal thread affinity Using 10 OpenMP threads per MPI process System total charge: 0.000 Will do PME sum in reciprocal space for electrostatic interactions. ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen A smooth particle mesh Ewald method J. Chem. Phys. 103 (1995) pp. 8577-8592 -------- -------- --- Thank You --- -------- -------- Using a Gaussian width (1/beta) of 0.384195 nm for Ewald Potential shift: LJ r^-12: -2.648e-01 r^-6: -5.349e-01, Ewald -8.333e-06 Initialized non-bonded Coulomb Ewald tables, spacing: 1.02e-03 size: 1176 Generated table with 1164 data points for 1-4 COUL. Tabscale = 500 points/nm Generated table with 1164 data points for 1-4 LJ6. Tabscale = 500 points/nm Generated table with 1164 data points for 1-4 LJ12. Tabscale = 500 points/nm Using GPU 8x8 nonbonded short-range kernels Using a dual 8x8 pair-list setup updated with dynamic, rolling pruning: outer list: updated every 100 steps, buffer 0.129 nm, rlist 1.329 nm inner list: updated every 14 steps, buffer 0.001 nm, rlist 1.201 nm At tolerance 0.005 kJ/mol/ps per atom, equivalent classical 1x1 list would be: outer list: updated every 100 steps, buffer 0.287 nm, rlist 1.487 nm inner list: updated every 14 steps, buffer 0.059 nm, rlist 1.259 nm Linking all bonded interactions to atoms Initializing LINear Constraint Solver ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ B. Hess and H. Bekker and H. J. C. Berendsen and J. G. E. M. Fraaije LINCS: A Linear Constraint Solver for molecular simulations J. Comp. Chem. 18 (1997) pp. 1463-1472 -------- -------- --- Thank You --- -------- -------- The number of constraints is 42145 ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ S. Miyamoto and P. A. Kollman SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid Water Models J. Comp. Chem. 13 (1992) pp. 952-962 -------- -------- --- Thank You --- -------- -------- Intra-simulation communication will occur every 100 steps. ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ G. Bussi, D. Donadio and M. Parrinello Canonical sampling through velocity rescaling J. Chem. Phys. 126 (2007) pp. 014101 -------- -------- --- Thank You --- -------- -------- ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ M. Bernetti, G. Bussi Pressure control using stochastic cell rescaling J. Chem. Phys. 153 (2020) pp. 114107 -------- -------- --- Thank You --- -------- -------- There are: 234198 Atoms Atom distribution over 7 domains: av 33456 stddev 919 min 32680 max 34903 Updating coordinates and applying constraints on the GPU. Center of mass motion removal mode is Linear We have the following groups for center of mass motion removal: 0: SOLU_MEMB 1: SOLV Started mdrun on rank 0 Wed Jun 14 10:36:43 2023 The -resethway functionality is deprecated, and may be removed in a future version. Step Time 0 0.00000 Energies (kJ/mol) Bond U-B Proper Dih. Improper Dih. CMAP Dih. 3.53545e+04 1.74874e+05 1.22907e+05 2.25610e+03 -8.16910e+02 LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. 2.28515e+04 -1.29666e+05 1.23167e+05 -2.72403e+06 1.09380e+04 Potential Kinetic En. Total Energy Conserved En. Temperature -2.36216e+06 6.44609e+05 -1.71755e+06 -1.71783e+06 3.12018e+02 Pressure (bar) Constr. rmsd 1.28383e+02 0.00000e+00