Clarification cores vs logical cores

GROMACS version:
GROMACS modification: Yes
Hello,

In this documentation on general guidelines for performance:
https://docs.bioexcel.eu/gromacs_bpg/en/master/cookbook/cookbook.html#multiple-networked-compute-nodes-in-a-cluster-or-supercomputer
It says that for: " Multiple networked compute nodes in a cluster or supercomputer"

Running with 1 rank per core and 1 OpenMP thread per rank and therefore with as many MPI ranks per node as there are cores on each node is often optimal.

Does this refer to a logical core?

Thank you very much for your help!
Best,
Sergio

Hello Sergio,

Yes, you can usually think of this statement as referring to logical cores.

If you are not making use of Simultaneous Multithreading (SMT) like Intel Hyperthreads for example, either because they have not been enabled when the machine was booted up or because of some option in your job script or parallel application launch command (mpirun, mpiexec, srun, aprun, etc.), then the number of physical cores is equal to the number of logical cores anyway.

If you are making use of SMT, then the Operating System running on a compute node will usually identify the presence of n times as many logical cores as there are physical cores (for a processor that supports n simultaneous multithreads per physical core).

In either case you should, as described in the Single Node situation in the Performance Cookbook, chose the number of MPI ranks, N, and number of OpenMP threads per rank, M, to be such that N x M equals the total number of logical cores available, but now also taking into account that you are running on more than one node. For example if you are running on 4 nodes, each of which has 16 physical cores, each of which supports 2 simultaneous multithreads, then you have 4 x 16 x 2 = 128 logical cores available. In principle you could choose the combination of MPI ranks and OpenMP threads (N, M) to be (128, 1), (64, 2), (32, 4), (16, 8), (8, 16), or (4, 32). The statement you quoted says that it is often optimal for performance to run with just 1 OpenMP thread per MPI rank, and hence as many MPI ranks as the number of logical cores, i.e. the combination (128, 1) above. It is however worth experimenting with multiple OpenMP threads per rank, especially as this often helps retain higher performance when scaling to more nodes.

As a separate question, whether it is a good idea to use SMT or not is difficult to give a general answer to. It can give some additional performance boost but it can also be slightly worse, as is mentioned for the benchmarks on the HAWK machine in the Performance Cookbook. It is worth experimenting with and trying for your system and simulation setup.

Best wishes,

Arno

Thank you so much for your thorough answer! I learned a lot :)
Best,
Sergio