Hi everybody,
I’m trying to install the CPU+GPU version of GROMACS 2020.4 on a Ryzen 3700X machine running under an updated UBUNTU 20.04/CUDA 11.1 (nvidia 455 driver). During make check I encounter the following errors:
The following tests FAILED:
5 - MdlibUnitTest (Failed)
10 - EwaldUnitTests (Failed)
12 - GpuUtilsUnitTests (Failed)
Errors while running CTest
make[3]: *** [CMakeFiles/run-ctest-nophys.dir/build.make:77: CMakeFiles/run-ctest-nophys] Error 8
make[2]: *** [CMakeFiles/Makefile2:3522: CMakeFiles/run-ctest-nophys.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:2660: CMakeFiles/check.dir/rule] Error 2
make: *** [Makefile:346: check] Error 2
More precisely:
…/gromacs-2020.4/build/bin/mdlib-test: symbol lookup error: …/gromacs-2020.4/build/bin/mdlib-test: undefined symbol: _ZN3gmx4test20integrateLeapFrogGpuEPNS0_16LeapFrogTestDataEi
I am experiencing the same behavior when building 2020.2 under Ubuntu 20.04 on a Ryzen 3700X with the nvidia 455 driver. My procedure was as follows:
tar xfz gromacs-2020.2.tar.gz
cd gromacs-2020.2
mkdir build
cd build
cmake .. -DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=ON -DGMX_GPU=on
make
make check
More information on the system:
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
gcc --version
gcc (Ubuntu 8.4.0-3ubuntu2) 8.4.0
g++ --version
g++ (Ubuntu 8.4.0-3ubuntu2) 8.4.0
uname -a
Linux marvin 5.4.0-54-generic #60-Ubuntu SMP Fri Nov 6 10:37:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 43 bits physical, 48 bits virtual
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 113
Model name: AMD Ryzen 7 3700X 8-Core Processor
Can I provide any more information to locate the error?
Thanks for you help!
Update: I can confirm that this also happens with gromacs 2020.4 plus I now also get an error in the GmxPreprocessTests:
The following tests FAILED:
5 - MdlibUnitTest (Failed)
10 - EwaldUnitTests (Failed)
12 - GpuUtilsUnitTests (Failed)
31 - GmxPreprocessTests (Failed)
With the respective output being
5/59 Test #5: MdlibUnitTest .......................***Failed 0.00 sec
/home/aretaon/progs/gromacs-2020.4/build/bin/mdlib-test: symbol lookup error: /home/aretaon/progs/gromacs-2020.4/build/bin/mdlib-test: undefined symbol: _ZN3gmx4test20integrateLeapFrogGpuEPNS0_16LeapFrogTestDataEi
10/59 Test #10: EwaldUnitTests ......................***Failed 0.00 sec
/home/aretaon/progs/gromacs-2020.4/build/bin/ewald-test: symbol lookup error: /home/aretaon/progs/gromacs-2020.4/build/bin/ewald-test: undefined symbol: _Z13pme_gpu_solvePK6PmeGpuP9t_complex12GridOrderingb
12/59 Test #12: GpuUtilsUnitTests ...................***Failed 0.00 sec
/home/aretaon/progs/gromacs-2020.4/build/bin/gpu_utils-test: symbol lookup error: /home/aretaon/progs/gromacs-2020.4/build/bin/gpu_utils-test: undefined symbol: _Z8findGpusP14gmx_gpu_info_t
The GmxProcessTests seem to have some problem with encoding (but the download was fine, I checked the md5sum).
[----------] 1 test from GenRestrTest
[ RUN ] GenRestrTest.SimpleRestraintsGenerated
Reading structure file
Group 0 ( System) has 156 elements
Group 1 ( Other) has 156 elements
Group 2 ( ) has 39 elements
Group 3 ( ile/3) has 36 elements
Group 4 ( ���U) has 0 elements
Group 5 ( �~;�U) has 7 elements
Group 6 ( ) has 1 elements
Group 7 ( @) has 11 elements
Select a group: Select group to position restrain
Selected 3: 'ile/3'
/home/aretaon/progs/gromacs-2020.4/src/testutils/refdata.cpp:873: Failure
In item: /Files/-o/Contents
Actual: '; position restraints for ile/3 of ��;�U
I can’t reproduce either of those on a very similar setup.
The symbol lookup errors are suspicious of some king of link-time issue. Have you tried to use e.g. a different CUDA version or gcc?
The latter also looks strange, the group selection output looks quite different starting from “Group 2” for me. I suggest to open an issue on https://gitlab.com/gromacs/gromacs for this as there may be a bug that is causing this.
thanks for looking into this. Eventually, I realised my mistake: I used Cuda10 and gcc8.4 together with an Ampere-GPU.
So now, I switched to Cuda11.1 and gcc9.3 but failed with gromacs 2020.2 (due to https://gitlab.com/gromacs/gromacs/-/merge_requests/461). However building and checking gromacs 2020.4 worked just fine and I will continue using 2020.4.
Thanks for the feedback. Still not sure how could the latter, GmxProcessTests test failures be related to the gcc 8 + CUDA 10 vs gcc 9 + CUDA 11.1. Have you also tried gcc 8 + CUDA 11.1?
Just tested gcc 8.4.0 with cuda 11.1 and gromacs 2020.04 and all tests
passed during make check.
Maybe it’s something with cuda10 not supporting the Ampere GPU? Not sure
if this plays a role during GmcProcessTests…
If you can reproducibly get failing tests with CUDA 10 on Ampere, can you please file an issue on https://gitlab.com/gromacs/gromacs/-/issue so we look into this (please mention both the cases that fail and those that succeed).