GROMACS 2020.3 on single CPU


I got this from bioexcel presentation and am wondering how this works.

Is there anything that i should be doing to speed up or improve performance on single-node CPU?
I have noticed that this message remains despite the versions ive tried. Not sure how to resolve or improve the performance.

Compiled SIMD: None, but for this host/run AVX2_256 might be better (see log).
Reading file md.tpr, VERSION 2020.3-MODIFIED (single precision)
Changing nstlist from 10 to 100, rlist from 1.2 to 1.299

Using 1 MPI thread
Using 4 OpenMP threads

Non-default thread affinity set probably by the OpenMP library,
disabling internal thread affinity

WARNING: Using the slow plain C kernels. This should
not happen during routine usage on supported platforms.

Hi - the SIMD note is much more important for performance than the thread affinity note. Your Gromacs install was not configured to use the SIMD capability of your CPU - you’ll want to reinstall following the SIMD-specific instructions here:
http://manual.gromacs.org/documentation/2020/install-guide/index.html#simd-support

Most modern Intel or AMD CPUs can use AVX2_256, and some Skylake or newer Intel CPUs can use AVX_512, which adds another boost for CPU-only simulatoins. gmx --version will show you whether or not you were successful, it has a line with the SIMD options. I’ll emphasize again - this is a MASSIVE performance benefit on CPU-only simulations.

For thread affinity - you’re probably not using all of the cores on your machine, in which case thread affinity is not set. If you want to use the whole machine to run simulations, ntomp * ntmpi should = the total number of cores (unless you’re using separate PME ranks, in which case it gets more complicated). You can also probably get rid of this message and improve performance by adding “-pin on -pinoffset 0” to your command line. That tells gromacs to run on cores 0-N.

Thanks for the suggestions! I will try rebuilding with SIMD. But why does that note also say that it might run better with AVX2_256? I assumed it meant that my machine might not be faster with SIMD.

Regarding thread affinity, how many cores should i choose?
From md.log file that im running a simulation now, it says that it runs on 1 node with total 2 cores and 4 logical cores. So would it be 2 cores to choose?

If it is 2 cores, then ntomp * ntmpi = 2 ? Which will be 1 and which will be 2?
I suppose this ntomp has to be put in command line during mdrun? So its not something that i have to change in my build right?

To clarify the SIMD thing - AVX2_256 (and all the others at the link I posted) are types of SIMD. The note is saying you are using no SIMD, and instead falling back to the non-SIMD implementation. SIMD is great because it allows your computer to do multiple computations in one compute cycle - with caveats, you’re multiplying the throughput of each core by a significant factor. Gromacs doesn’t automatically pick the SIMD for you sometimes because SIMD instructions are CPU-dependent, and different platforms have different capabilities.

You can use all 4 logical cores, but there won’t be a huge performance difference between using 2 cores and 4 cores, since the 4 logical cores share two physical cores. You may want to only run on 2 cores if you are doing other stuff on the machine in the background.

In either case, on such a small machine I suggest running with -ntmpi 1 and -ntomp {core count}. Adding multiple ranks (using ntmpi) is typically only worth it when you hit double-digit cores. Note that without a more powerful computer or a GPU, you won’t get any reasonable simulation lengths in a sane amount of time. On a small CPU-only PC, Gromacs is really only useful for simulation setup and result analysis, and perhaps energy minimization.

Also yes - ntomp/ntmpi and anything else listed in the mdrun documentation are runtime options.

Things like SIMD and GPU support are compile-time options.

ah i understand much much better…thanks a lot for the clarification! One more question, if im using -ntmpi 1 and -ntomp {core count}, this will be input into every mdrun no matter whether its energy minimization, equilibration or production?

very nice! my gromacs doesnt show SIMD comments anymore after building again, although initially it asked me to try out AVX2_256 instead.
And i ran one of the equilibration steps that took ages to even print the next line. Now it took less than 10s to complete!

Thanks a lot for teaching me how to do all these @kevinboyd :)

this will be input into every mdrun no matter whether its energy minimization, equilibration or production?

Probably, though some production options don’t apply to minimization. Minimization doesn’t typically need performance optimizations