Thread affinity errors

GROMACS version: 2019.3
GROMACS modification: No

Sometimes I get the following affinity errors, what is the problem? Gromacs compiled with OpenMPI and gcc, with CUDA support.

mpirun -np $NTASKS --map-by ppr:$PPN:node gmx_mpi mdrun -s topol.tpr -resethway -noconfout -nsteps 10000 -v -pin on -nb gpu -pme cpu

NOTE: In MPI process #12: Affinity setting for 8/8 threads failed.
NOTE: In MPI process #14: Affinity setting for 8/8 threads failed.
NOTE: In MPI process #26: Affinity setting for 8/8 threads failed.
NOTE: In MPI process #28: Affinity setting for 8/8 threads failed.
NOTE: In MPI process #29: Affinity setting for 8/8 threads failed.
NOTE: In MPI process #9: Affinity setting for 8/8 threads failed.
NOTE: In MPI process #30: Affinity setting for 8/8 threads failed.

NOTE: Thread affinity was not set.

When this happens, runtime is obviously much longer. Is this a known bug? Any idea how to overcome this problem?

No, there is no such known issue, this should not happen. Can you narrow down under which circumstanced does the error occur?

It seems a non-deterministic error although I think I can make it more frequent with some UCX flags.

You can work around the issue by relying on your job scheduler / MPI launcher to set the correct affinities (and leave the -pin auto option which is the default).

Before we try to look further into diagnosing what is happening here, can you please try to reproduce this with the current release? The relevant parts of mdrun have been improved since the 2019 release so we should verify whether the errors are still present.


Szilárd