GROMACS performance on CPU 8 cores/Installation to non-standard location

GROMACS version: 2024.3
GROMACS modification: Yes/No
Hello everyone!
I am new to Gromacs, and I encountered the following problem recently.

In our group we have a shared desktop for MD simulations with older Gromacs version already installed. As I wanted to use the new version and modify some files in it to add a new non-standart aminoacid residue, I installed the 2024.3 version to my personal folder
/home/lena using -DCMAKE_INSTALL_PREFIX=
and added
source ~/gromacs-2024.3/bin/GMXRC
to .bashrc
which gmx confirmed, that I use the local version of Gromacs (/home/lena/gromacs-2024.3/bin/gmx), and I was able to start the simulation successfully from the working directory ~/MD/ProtA. But the performance was just 2 ns/day.

System:
[ molecules ]
; Compound #mols
PROA 1
SOL 15655
CL 2
Total atoms: 51067
Protein atoms: 4100 (265 residues)

Running on 1 node with total 8 cores, 16 processing units
Hardware detected on host:
CPU info:
Vendor: Intel
Brand: Intel(R) Core™ i9-9900K CPU @ 3.60GHz

However, I tried to run the exact same simulation with the same input files on my personal laptop (AMD Ryzen 9 8945HS, 8 cores, Gromacs 2024.3) where the path to GRMXC is

/usr/local/gromacs/bin/gmx

and got ~15 ns/day.

Running on 1 node with total 8 cores, 16 processing units
Hardware detected on host LAPTOP-QU8TNGH9:
CPU info:
Vendor: AMD
Brand: AMD Ryzen 9 8945HS w/ Radeon 780M Graphics

What could be the reason for my simulation on the desktop to be so slow? Is it somehow related to the installation directory? Or is it just the hardware difference?

Any help will be appreciated!

Hi!

The installation path should not affect the performance in any way (in any sane situation). However, there are likely many other differences, including power saving settings, what else is running on the machine (GROMACS tends to assume it has full CPU to itself), how exactly you built GROMACS (compiler, flags, …), and how you launch the simulation.

Comparing the full log files from the two runs would be a good start to see both the differences in how GROMACS is built (at the top of the log) and in the detailed performance counters (at the end of the log).

For ~50 thousand atoms, even 15 ns/day looks to me a bit low for such CPUs, but there could be reasons for that. 2 ns/day is definitely not ok, the hardware is not that different.

Thanks a lot for your advice!

I forgot to mention that for my personal laptop (AMD Ryzen 9 8945HS, 8 cores, Gromacs 2024.3) I run Gromacs via WSL2, could that influence the performance as well?
Is there any literature/article you would recommend that has any reference numbers for the performance counters? As this is my first simulation I can’t really evaluate how adequate are the numbers I get.

I ran 2 ns simulations on both setups again, nothing else was running on the machines at that time.
Overall, I found the following from the log files:

“Update groups can not be used for this system because atoms that are (in)directly constrained together are interdispersed with other atoms”. From what I’ve read this can also hinder the performance somehow and is caused by the use of CHARMM-GUI for new amino acid parametrization (which adds all the hydrogens at the end rather then following the heavy atom they’re atteched to). But this message is the same for both simulations.

For the difference in the log files (attached for reference) the SIMD instructions and
CPU FFT library were different for laptop and desktop, as well as C/C++ compilers (GNU 13.2.0 and 9.4.0 respectively) and some of their flags.

Laptop AVX_512 fftw-3.3.8-sse2-avx-avx2-avx2_128-avx512 -march=skylake-avx512

Desktop AVX2_256 fftw-3.3.8-sse2-avx-avx2-avx2_128 -mavx2 -mfma

The biggest difference in performance counts is in the Real cycle and time accounting section.
Force 72.8% (laptop) 46.7% (desktop)
PME mesh 19.1% 9.1%
NB X/F buffer ops. line 2.9% 41.4%

What could be the reason for this?
Overall, the performance was as follows:
For laptop:
(ns/day) (hour/ns)
Performance: 10.311 2.328

For desktop:
(ns/day) (hour/ns)
Performance: 2.066 11.615

Unfortunately I have no programming or GROMACS knowledge whatsoever, so any comment would be very useful.
desktop_log.txt (20.4 KB)
laptop_log.txt (20.4 KB)

Thanks again for your help!

The laptop log look okay. On the desktop, the “buffer ops” taking 41% of time on the desktop is very abnormal (if you look at “Wall time” for other operations, you also see them being many times slower on the laptop, but buffer ops are ~70x slower!).

Are you sure that there is nothing else is running on the desktop? If you launch top, do you see only gmx running consuming around 1600% of the CPU, and nothing else using more 15%? After GROMACS has been running for ~5 minutes, what are the load average values in the first line of top output?

P.S.: If you’re just trying out different settings to optimize performance, there is no need to run for many hours. You can try adding -maxh 0.1 flag to gmx mdrun option stop the run after 6 minutes; that should be enough to get a decent performance estimate.

GROMACS in the Cloud: A Global Supercomputer to Speed Up Alchemical Drug Design gives a broad overview. It’s focused on datacenter CPUs, but you can get a ballpark figure there.

Those have far greater impact than installation directory can ever have. Just FYI. That still does not explain the observed difference between the two machines, and the settings are ok for the respective CPUs. Newer compilers are typically better

Again, this is much more relevant than the installation directory :)
But not enough to explain the performance difference (besides, I’d expect WSL to have negative effect).

Good observation. This would have a big impact if you were using GPUs (and somewhat easy to fix by reordering the atoms in the input files). But here, as you noted, it’s the same for both simulations.