GROMACS 2020.4 installation problem

GROMACS version: 2020.4
GROMACS modification: No

Hi everybody,
I’m trying to install the CPU+GPU version of GROMACS 2020.4 on a Ryzen 3700X machine running under an updated UBUNTU 20.04/CUDA 11.1 (nvidia 455 driver). During make check I encounter the following errors:
The following tests FAILED:
5 - MdlibUnitTest (Failed)
10 - EwaldUnitTests (Failed)
12 - GpuUtilsUnitTests (Failed)
Errors while running CTest
make[3]: *** [CMakeFiles/run-ctest-nophys.dir/build.make:77: CMakeFiles/run-ctest-nophys] Error 8
make[2]: *** [CMakeFiles/Makefile2:3522: CMakeFiles/run-ctest-nophys.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:2660: CMakeFiles/check.dir/rule] Error 2
make: *** [Makefile:346: check] Error 2

More precisely:
…/gromacs-2020.4/build/bin/mdlib-test: symbol lookup error: …/gromacs-2020.4/build/bin/mdlib-test: undefined symbol: _ZN3gmx4test20integrateLeapFrogGpuEPNS0_16LeapFrogTestDataEi

…/gromacs-2020.4/build/bin/ewald-test: undefined symbol: Z27pme_gpu_get_real_grid_sizesPK6PmeGpuPN3gmx11BasicVectorIiEES5

…/gromacs-2020.4/build/bin/gpu_utils-test: symbol lookup error: …/gromacs-2020.4/build/bin/gpu_utils-test: undefined symbol: _Z18isHostMemoryPinnedPKv

Is there any solution for the above or should I ignore it?
I thank you in advance.
Best.

This is not expected and it is quite unusual, please post the commands you used to configure and build GROMACS.

Hi,

I am experiencing the same behavior when building 2020.2 under Ubuntu 20.04 on a Ryzen 3700X with the nvidia 455 driver. My procedure was as follows:

tar xfz gromacs-2020.2.tar.gz
cd gromacs-2020.2
mkdir build
cd build
cmake .. -DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=ON -DGMX_GPU=on
make
make check

More information on the system:

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

gcc --version
gcc (Ubuntu 8.4.0-3ubuntu2) 8.4.0

g++ --version
g++ (Ubuntu 8.4.0-3ubuntu2) 8.4.0

uname -a
Linux marvin 5.4.0-54-generic #60-Ubuntu SMP Fri Nov 6 10:37:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   43 bits physical, 48 bits virtual
CPU(s):                          16
On-line CPU(s) list:             0-15
Thread(s) per core:              2
Core(s) per socket:              8
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           113
Model name:                      AMD Ryzen 7 3700X 8-Core Processor

Can I provide any more information to locate the error?
Thanks for you help!

Update: I can confirm that this also happens with gromacs 2020.4 plus I now also get an error in the GmxPreprocessTests:

The following tests FAILED:
	  5 - MdlibUnitTest (Failed)
	 10 - EwaldUnitTests (Failed)
	 12 - GpuUtilsUnitTests (Failed)
	 31 - GmxPreprocessTests (Failed)

With the respective output being

5/59 Test  #5: MdlibUnitTest .......................***Failed    0.00 sec
/home/aretaon/progs/gromacs-2020.4/build/bin/mdlib-test: symbol lookup error: /home/aretaon/progs/gromacs-2020.4/build/bin/mdlib-test: undefined symbol: _ZN3gmx4test20integrateLeapFrogGpuEPNS0_16LeapFrogTestDataEi

10/59 Test #10: EwaldUnitTests ......................***Failed    0.00 sec
/home/aretaon/progs/gromacs-2020.4/build/bin/ewald-test: symbol lookup error: /home/aretaon/progs/gromacs-2020.4/build/bin/ewald-test: undefined symbol: _Z13pme_gpu_solvePK6PmeGpuP9t_complex12GridOrderingb

12/59 Test #12: GpuUtilsUnitTests ...................***Failed    0.00 sec
/home/aretaon/progs/gromacs-2020.4/build/bin/gpu_utils-test: symbol lookup error: /home/aretaon/progs/gromacs-2020.4/build/bin/gpu_utils-test: undefined symbol: _Z8findGpusP14gmx_gpu_info_t

The GmxProcessTests seem to have some problem with encoding (but the download was fine, I checked the md5sum).

[----------] 1 test from GenRestrTest
[ RUN      ] GenRestrTest.SimpleRestraintsGenerated

Reading structure file
Group     0 (         System) has   156 elements
Group     1 (          Other) has   156 elements
Group     2 (               ) has    39 elements
Group     3 (          ile/3) has    36 elements
Group     4 (         ���U) has     0 elements
Group     5 (         �~;�U) has     7 elements
Group     6 (              ) has     1 elements
Group     7 (              @) has    11 elements
Select a group: Select group to position restrain
Selected 3: 'ile/3'
/home/aretaon/progs/gromacs-2020.4/src/testutils/refdata.cpp:873: Failure
  In item: /Files/-o/Contents
   Actual: '; position restraints for ile/3 of ��;�U

Hi,

I can’t reproduce either of those on a very similar setup.

The symbol lookup errors are suspicious of some king of link-time issue. Have you tried to use e.g. a different CUDA version or gcc?

The latter also looks strange, the group selection output looks quite different starting from “Group 2” for me. I suggest to open an issue on https://gitlab.com/gromacs/gromacs for this as there may be a bug that is causing this.

Szilárd

Hi Szilárd,

thanks for looking into this. Eventually, I realised my mistake: I used Cuda10 and gcc8.4 together with an Ampere-GPU.
So now, I switched to Cuda11.1 and gcc9.3 but failed with gromacs 2020.2 (due to https://gitlab.com/gromacs/gromacs/-/merge_requests/461). However building and checking gromacs 2020.4 worked just fine and I will continue using 2020.4.

Label Time Summary:
GTest              = 115.29 sec*proc (55 tests)
IntegrationTest    =  43.91 sec*proc (12 tests)
MpiTest            =  59.10 sec*proc (8 tests)
SlowTest           =  51.49 sec*proc (2 tests)
UnitTest           =  19.89 sec*proc (41 tests)

Not sure if it was the same issue for the OP, but for me this is closed.

Thanks for the feedback. Still not sure how could the latter, GmxProcessTests test failures be related to the gcc 8 + CUDA 10 vs gcc 9 + CUDA 11.1. Have you also tried gcc 8 + CUDA 11.1?

Just tested gcc 8.4.0 with cuda 11.1 and gromacs 2020.04 and all tests
passed during make check.
Maybe it’s something with cuda10 not supporting the Ampere GPU? Not sure
if this plays a role during GmcProcessTests…

In principle, if you do not restrict the target architectures (by using -DGMX_CUDA_TARGET_SM), Ampere should work even when GROMACS is compiler with earlier CUDA versions (ref https://docs.nvidia.com/cuda/ampere-compatibility-guide/index.html#verifying-ampere-compatibility-using-cuda-10-2).

If you can reproducibly get failing tests with CUDA 10 on Ampere, can you please file an issue on https://gitlab.com/gromacs/gromacs/-/issue so we look into this (please mention both the cases that fail and those that succeed).

Thanks,
Szilárd