GROMACS version: 2021.2
GROMACS modification: No
I have just mount a new workstation with a AMD Ryzen 9 5950X cpu and Geforce GTX1650 video card. The installed OS was Opensuse Leap 15.2,
The main purpose of this workstation is to rum simulations with Gromacs. After its compilation without any warning, I’m having the following tests failing when checking it:
32 - RandomUnitTests (Failed)
40 - GmxAnaTest (Failed)
41 - GmxPreprocessTests (SEGFAULT)
48 - TrajectoryAnalysisUnitTests (Failed)
50 - ToolUnitTests (SEGFAULT)
53 - MdrunOutputTests (Failed)
54 - MdrunModulesTests (Failed)
55 - MdrunIOTests (Failed)
56 - MdrunTests (Failed)
57 - MdrunPmeTests (Failed)
60 - MdrunMpiTests (Failed)
61 - MdrunMpiPmeTests (Failed)
64 - MdrunFEPTests (Failed)
66 - GmxapiExternalInterfaceTests (Failed)
67 - GmxapiInternalInterfaceTests (Failed)
Essentially two kinds of failure appear in the tests:
C++ exception with description “random_device: rdrand failed” and
Attempted to call init_inputrec_strings before calling done_inputrec_strings.
The first failure above, the rdrand one, appears to be related with some AMD CPU’s, as described here: https://twitter.com/FiloSottile/status/1125840275346198529
I made everything possible to try to fix these failures by my self, including the compilation and instalation of GCC 11.1.0 and Intel oneAPI toolkit. Every time I finished with the same errors. :-(
Does someone can help me?
GROMACS version: 2021.2
the random device issue looks weird, as we merged patches for something like this in the 2020 and 2021 branches a while ago. Could you check if other applications are able to request random numbers, or is it just GROMACS that fails?
The other stuff looks a bit more serious. Could you post your full CMake configuration (and as much information about the oneAPI toolkit you have installed as possible)?
Also, I think this warrants a bug report on Gitlab (https://gitlab.com/gromacs/gromacs/-/issues).
Also, FYI depending on your Motherboard vendor, updating the BIOS to the latest supported version will typically fix that RDRAND issue.
Hi Paul. Would you sugest, please, other applications that use random numbers?
The cmake (verson 3.17.0) command line used, when compiling with GNU gcc was:
cmake … -DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=ON
And with Intel compilar, first, I had to create and export CC and CXX enviroment variables:
cmake … -DREGRESSIONTEST_DOWNLOAD=ON -DGMX_FFT_LIBRARY=mkl
Do you want all the cmake outputs?
The oneAPI toolkit is the one currently available for download in:
Download the Intel® oneAPI Base Toolkit and corresponds to Intel(R) oneAPI DPC++ Compiler 2021.2.0 (2021.2.0.20210317) (clang 12).
I couldn’t be able to compile gromacs 2020.6 with GNU gcc 11 or intel compiler. I could use them only with gromacs 2021.2.
During the compilation, I did not observe any error in any case.
And, answering to kevinboyd: unfortunately, I’m not in front of the computer to proceed with the BIOS update and it will take longer until I will have an opportunity to return to the laboratory where the computer is situated. The computer’s motherboard is a Gigabyte B550 Gaming X V2. I don’t have any way to see, remotely, the BIOS version. :-(
I’m working on reproducing this on my AMD machine, but in a OpenSuse 15.2 docker container
I can’t reproduce the segmentation faults, but got a different test failure in the correlation tests
[ RUN ] ExpfitTest.EffnERREST /gromacs/src/testutils/refdata.cpp:927: Failure In item: /result/ Actual: 317.38389214245279 Reference: 103.32999340041971 Difference: 214.054 (7319473861338039 double-prec. ULPs, rel. 2.07) Tolerance: abs. 0.005, rel. 0.005 [ FAILED ] ExpfitTest.EffnERREST (1 ms)
In our lab., we have gromacs working fine in 2 workstations with Ryzen 7 3800xt, 2nd. generation of Zen processors (Zen2). The workstation that is giving the problem has a Ryzen 9 5950x, 3rd. generation of AMD Zen processors (Zen3).
What AMD processor does your computer have? Is it from 2nd. or 3rd. generation?
My processor is a AMD Ryzen 7 3800X, so Zen2. I’ll see if I can test somewhere on a Zen3 machine.
Still need to check the correlation test failure.