Failing tests Gromacs 2021.1

GROMACS version:2021.1
GROMACS modification: No

Hi, can somone help me in order to resolve those failing tests that gromacs is giving me at the end of the “make check”?

The build that i’m trying to do is for an Arm arch64 cluster, in particular it has 7 nodes, 1 front-end, for each node there’s a ThunderX2 99xx 64 cores, 8 tesla V100 and 2 Tesla T4.
So what i’m trying to do is building my gromacs with this command:

cmake … -DREGRESSIONTEST_DOWNLOAD=ON -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ -DGMX_MPI=on -DGMX_GPU=CUDA -DGMX_SIMD=ARM_NEON_ASIMD -DBUILD_SHARED_LIBS=on -DCMAKE_INSTALL_PREFIX=/opt/share/gromacs-2021.1-gmx -DFFTWF_LIBRARY=/opt/share/fftw-3.3.9-gromacs-2021.1/lib/libfftw3f.so -DFFTWF_INCLUDE_DIR=/opt/share/fftw-3.3.9-gromacs-2021.1/include -DGMX_USE_LMFIT=internal

This build use: Cuda-11.1 - fftw 3.3.9 - gcc-9.3 - openmpi-4.1.1 - cmake-3.18.4

This is the end of my make check

88% tests passed, 9 tests failed out of 73

Label Time Summary:
GTest = 118.93 secproc (67 tests)
IntegrationTest = 29.67 sec
proc (20 tests)
MpiTest = 62.30 secproc (10 tests)
SlowTest = 77.24 sec
proc (8 tests)
UnitTest = 12.02 sec*proc (39 tests)

Total Test time (real) = 175.14 sec

The following tests FAILED:
56 - MdrunTests (ILLEGAL)
57 - MdrunPmeTests (ILLEGAL)
60 - MdrunMpiTests (Failed)
61 - MdrunMpiPmeTests (Failed)
64 - MdrunFEPTests (ILLEGAL)
70 - regressiontests/complex (Failed)
71 - regressiontests/freeenergy (Failed)
72 - regressiontests/rotation (Failed)
73 - regressiontests/essentialdynamics (Failed)

another thing that i’ve noticed is that when i do make install it generates gmx_mpi and if i launch: “gmx_mpi --version” it gives me this warning:

WARNING: There was an error initializing an OpenFabrics device.

Any suggestion or extra advice about my building are also appreciated.

Kind regards, Federico.

Also the out put of gmx_mpi mdrun -version is this:

By default, for Open MPI 4.0 and later, infiniband ports on a device
are not used by default. The intent is to use UCX for these devices.
You can override this policy by setting the btl_openib_allow_ib MCA parameter
to true.

Local host: armida-fe
Local adapter: mlx5_0
Local port: 1



WARNING: There was an error initializing an OpenFabrics device.

Local host: armida-fe
Local device: mlx5_0

GROMACS: gmx mdrun, version 2021.1
Executable: /opt/share/gromacs-2021.1-gmx/bin/gmx_mpi
Data prefix: /opt/share/gromacs-2021.1-gmx
Working dir: /home/fferrari
Command line:
gmx_mpi mdrun -version

GROMACS version: 2021.1
Verified release checksum is 8c24bff5d3f78b0a9afb16e880b5667e5affe9a686d462482bac20ce975492c6
Precision: mixed
Memory model: 64 bit
MPI library: MPI
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support: CUDA
SIMD instructions: ARM_NEON_ASIMD
FFT library: fftw-3.3.9-neon
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
C compiler: /opt/share/arm/20.3/gcc-9.3.0_Generic-AArch64_RHEL-8_aarch64-linux/bin/gcc GNU 9.3.0
C compiler flags: -pthread -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -pthread -O3 -DNDEBUG
C++ compiler: /opt/share/arm/20.3/gcc-9.3.0_Generic-AArch64_RHEL-8_aarch64-linux/bin/g++ GNU 9.3.0
C++ compiler flags: -pthread -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -pthread -fopenmp -O3 -DNDEBUG
CUDA compiler: /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2020 NVIDIA Corporation;Built on Mon_Oct_12_20:10:01_PDT_2020;Cuda compilation tools, release 11.1, V11.1.105;Build cuda_11.1.TC455_06.29190527_0
CUDA compiler flags:-std=c++17;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-Wno-deprecated-gpu-targets;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_35,code=compute_35;-gencode;arch=compute_53,code=compute_53;-gencode;arch=compute_80,code=compute_80;-use_fast_math;-D_FORCE_INLINES;-pthread -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -pthread -fopenmp -O3 -DNDEBUG
CUDA driver: 11.10
CUDA runtime: N/A

Problem fixed, to know the resolution comment the post!

1 Like

how to fix it?

Hi @s.pallav I had this problem because I was using a wrong version of fftw more specifically I used the double precision version when I was telling Gromacs to compile for the single precision. The other problem that I had was with this warning:

WARNING: There was an error initializing an OpenFabrics device

The warning was due to a non completely correct compilation of ucx, in order to fix this when you compile the ucx you need to use this option:

–with-mlx5-d

But of course use the right device for you infrastructure so if you have the mlx4 i think you need to use the option --with-mlx4-d but make sure of this by check the command ./configure --help to seek the right option.
(In other forums they also suggests to use the --without-verbs option but you are telling Open MPI to ignore it’s internal support for libverbs and use UCX instead and this does not affect how UCX works and should not affect performance).

and also to correctly set these two environment variables:

UCX_TLS
UCX_NET_DEVICES

If you don’t wan’t to re-compile the ucx library you can use these environment variables when you will use gmx:

export OMPI_MCA_btl_openib_allow_ib=1
export OMPI_MCA_orte_base_help_aggregate=0
export OMPI_MCA_mpi_warn_on_fork=0

But this is a temporary fix, to solve the problem you will need correctly re-compile the ucx.

1 Like