How to improve performance of gromacs?

GROMACS version: 2020.3

GROMACS version: 2020.3-MODIFIED
This program has been built from source code that has been altered and does not match the code released as part of the official GROMACS version 2020.3-MODIFIED. If you did not intend to use an altered GROMACS version, make sure to download an intact source distribution and compile that before proceeding.
If you have modified the source code, you are strongly encouraged to set your custom version suffix (using -DGMX_VERSION_STRING_OF_FORK) which will can help later with scientific reproducibility but also when reporting bugs.
Release checksum: c0599e547549c2d0ef4fc678dc5a26ad0000eab045e938fed756f9ff5b99a197
Computed checksum: 0c3db5f0820182974c80d37f12fef2b8c01cfc339569e0ef607045b5c5bfcbdb
Precision: single
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support: disabled
SIMD instructions: AVX2_256
FFT library: Intel MKL
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.26.28801/bin/Hostx64/x64/cl.exe MSVC 19.26.28806.0
C compiler flags: /arch:AVX2 /MD /O2 /Ob2 /DNDEBUG
C++ compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.26.28801/bin/Hostx64/x64/cl.exe MSVC 19.26.28806.0
C++ compiler flags: /arch:AVX2 /wd4800 /wd4355 /wd4996 /wd4305 /wd4244 /wd4101 /wd4267 /wd4090 /wd4068 /analyze /analyze:stacksize 70000 /wd6001 /wd6011 /wd6053 /wd6054 /wd6385 /wd6386 /wd6387 /wd28199 /wd6239 /wd6240 /wd6294 /wd6326 /wd28020 /wd6330 /wd6993 /wd6031 /wd6244 /wd6246 -openmp /MD /O2 /Ob2 /DNDEBUG

Command line:
gmx mdrun -deffnm npt -ntmpi 1 -ntomp 2

Back Off! I just backed up npt.log to ./#npt.log.1#
Reading file npt.tpr, VERSION 2020.3-MODIFIED (single precision)
Changing nstlist from 10 to 100, rlist from 1.2 to 1.255

Using 1 MPI thread
Using 2 OpenMP threads

Non-default thread affinity set probably by the OpenMP library,
disabling internal thread affinity
starting mdrun ‘Protein in water’
100000 steps, 100.0 ps.

Writing final coordinates.

           Core t (s)   Wall t (s)        (%)
   Time:   126648.000    63324.000      200.0
                        (ns/day)    (hour/ns)

Performance: 0.136 175.898

This took ridiculously long to complete, for a single IgG equilibration. Compared to other posts here that stated their performance of gromacs, 0.136 ns/day is not feasible to do any simulation for large biomolecules.

I have already tried to optimize gromacs for my system, that runs on windows 10, Intel® Core™ i5-6300U CPU

For 0.136 ns/day that i obtained for this run above, is that an acceptable value for my operating system?

How large is your system? A dual-core laptop CPU is generally not suitable for production runs unless your simulated system is very small.

IgGof roughly 150kDa should be more than 10000 atoms.
Are there any estimates or benchmarks out there for different computer set ups and different simulation systems?
Maybe itll be easier to guage run times in that way?

Just Google “GROMACS benchmarks” and you’ll find tons.

One example:

Generally you want to shoot for a few hundred atoms per processor when using CPU-only mode. 10k atoms will never perform well on a dual-core machine.

ah no wonder it is so slow on my laptop… when considering the number of atoms, this includes solvent molecules and not just the molecule of interest?

Everything in the simulation system matters. You spend about 90% of the calculation time on water…

Wow…didn’t know water takes that much computation time! Thanks Justin

It’s not that water takes some special amount of time, but about 90% of the atoms in most systems belong to water. Hence that’s what ends up eating up the computation time.

That’s true…thanks for the correction

No wonder so much emphasis is on reducing box volume with different geometries