MdrunModulesTests timeout on Gromacs 2021, gcc 10.2.1

GROMACS version: 2021
GROMACS modification: No

Dear all,

I am attempting to compile Gromacs 2021 (although I experienced the same problem while trying to build 2020.5). I use the following cmake call:

cmake … -DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=ON -DCMAKE_INSTALL_PREFIX=/opt/software/gromacs-2021-nogpu -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ -DGMX_GPU=OFF -DGMX_MPI=OFF -DBUILD_SHARED_LIBS=ON -DGMX_HWLOC=ON

What I am actually trying to do is to build a CUDA enabled version (CUDA 11.1) but I found that the problem persists without CUDA, so I thought it would be better to post the info for the CUDA-less build.

The build process (cmake and make), runs without problems. When I run the tests with “make check”, the test MdrunModulesTests times out (after about 120 sec). I haven’t been able to find more information about what exactly is MdrunModulesTests checking, so I’m writing to ask how serious it is, what could be the cause of the problem, and how could I fix it.

Below is some system info and the output of “gmx mdrun -version”:

The system runs Fedora 32. The relevant output of “gcc --version” is:

gcc (GCC) 10.2.1 20201016 (Red Hat 10.2.1-6)

The output for “cat /proc/cpuinfo” is:

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 85
model name : Intel® Xeon® Gold 6130 CPU @ 2.10GHz
stepping : 4
microcode : 0x2006906
cpu MHz : 3453.961
cache size : 22528 KB
physical id : 0
siblings : 32
core id : 0
cpu cores : 16
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm consta
nt_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid
dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd
mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap
clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_cl
ear flush_l1d
vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid
ple shadow_vmcs pml ept_mode_based_exec tsc_scaling
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit
bogomips : 4200.00
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:

(repeated for the 64 cores seen by the OS, 32 of which are real)

The relevant output from “./gmx mdrun -version” (run in the build/bin directory) is:

GROMACS: gmx mdrun, version 2021
Executable: /wrk/programs/gromacs-2021/build-nocuda/bin/./gmx
Data prefix: /wrk/programs/gromacs-2021 (source tree)
Working dir: /wrk/programs/gromacs-2021/build-nocuda/bin
Command line:
gmx mdrun -version

GROMACS version: 2021
Verified release checksum is 3e06a5865d6ff726fc417dea8d55afd37ac3cbb94c02c54c76d7a881c49c5dd8
Precision: mixed
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support: disabled
SIMD instructions: AVX_512
FFT library: fftw-3.3.8-sse2-avx-avx2-avx2_128-avx512
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
C compiler: /usr/bin/gcc GNU 10.2.1
C compiler flags: -mavx512f -mfma -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -O2 -DNDEBUG
C++ compiler: /usr/bin/g++ GNU 10.2.1
C++ compiler flags: -mavx512f -mfma -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -fopenmp -O2 -DNDEBUG

Cheers,

Raul

That’s strange, for me MdrunModulesTests executes in <1s. Can you try to turn off OpenMP and see if that is something that contributes to the issue? Is your machine idle when you run make check?

Also, do you not build in release mode (you have -O2)?

Hi,

Thanks for the answer.

I have been attempting a “regular” build, with -O2. I now tested a “debug” build, without -O2, the problem persists.

OpenMP does seem to play a role. If I set OMP_THREAD_LIMIT=1, several tests fail, but MdrunModulesTests run correctly, without timing out. An MPI build (-DGMX_MPI=on) also runs MdrunModulesTests correctly (as well as all the other tests). As this runs 8 omp threads per MPI process, I thought setting OMP_THREAD_LIMIT=8 on the non-mpi build could get the test to run, but it didn’t.

I don’t know how to build without openmp.

For now, the MPI build passes all the tests, so I will be using that.

For reference, my system is running libomp 10.0.1, libgomp 10.2.1, and openmpi 4.0.4