Gromacs 2025.3 installation, make check regression tests /complex failed

GROMACS version: 2025.3
GROMACS modification: No

g ++ 13.30, Mpirun 4.1.6, cmake 3.28.3, Cuda 13.0.48, Python 3.12.3

My flags are

cmake .. -DGMX_GPU=CUDA DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=ON

when I run

make check

I get that regressiontest/complex failed (as shown below). All other test are passed. I would really like some advise on how to proceed for finding out, what I did wrong. Thank you! /Marie

93/94 Test #93: regressiontests/complex …***Failed 375.93 sec
Re-running aminoacids using only CPU-based non-bonded kernels
Re-running aminoacids using CPU-based update
Re-running awh_multidim using only CPU-based non-bonded kernels
Re-running awh_multidim using CPU-based update
Re-running butane using only CPU-based non-bonded kernels
Re-running butane using CPU-based update
Re-running dd121 using only CPU-based non-bonded kernels
Mdrun cannot use the requested (or automatic) number of ranks, retrying with 8.

Abnormal return value for ’ gmx mdrun -nb cpu -notunepme >mdrun.out 2>&1’ was 1
Retrying mdrun with better settings…
Re-running distance_restraints using only CPU-based non-bonded kernels
Re-running distance_restraints using CPU-based update
Re-running ethyleenglycol using only CPU-based non-bonded kernels
Mdrun cannot use the requested (or automatic) number of ranks, retrying with 8.

Abnormal return value for ’ gmx mdrun -notunepme -nb cpu >mdrun.out 2>&1’ was 1
Retrying mdrun with better settings…
Re-running ethyleenglycol using CPU-based update
Re-running field using only CPU-based non-bonded kernels
Re-running field using CPU-based update
Re-running nacl using only CPU-based non-bonded kernels
Re-running nacl using CPU-based update

Abnormal return value for ’ gmx mdrun -nb cpu -notunepme >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in nbnxn-energy-groups for nbnxn-energy-groups
Re-running nbnxn-free-energy using only CPU-based non-bonded kernels
Mdrun cannot use the requested (or automatic) number of ranks, retrying with 8.

Abnormal return value for ’ gmx mdrun -notunepme -nb cpu >mdrun.out 2>&1’ was 1
Retrying mdrun with better settings…
Re-running nbnxn-free-energy using CPU-based update
Re-running nbnxn-free-energy-vv using only CPU-based non-bonded kernels
Mdrun cannot use the requested (or automatic) number of ranks, retrying with 8.

Abnormal return value for ’ gmx mdrun -notunepme -nb cpu >mdrun.out 2>&1’ was 1
Retrying mdrun with better settings…
Re-running nbnxn-ljpme-geometric using only CPU-based non-bonded kernels

Abnormal return value for ’ gmx mdrun -notunepme -nb cpu >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in nbnxn-ljpme-geometric/nb-cpu for nbnxn-ljpme-geometric-nb-cpu
Re-running nbnxn-ljpme-geometric using CPU-based update
Re-running nbnxn-ljpme-LB using only CPU-based non-bonded kernels
Re-running nbnxn-ljpme-LB-geometric using only CPU-based non-bonded kernels
Re-running nbnxn-vdw-force-switch using only CPU-based non-bonded kernels

Abnormal return value for ’ gmx mdrun -notunepme -nb cpu >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in nbnxn-vdw-force-switch/nb-cpu for nbnxn-vdw-force-switch-nb-cpu
Re-running nbnxn-vdw-force-switch using CPU-based update
Re-running nbnxn-vdw-potential-switch using only CPU-based non-bonded kernels

Abnormal return value for ’ gmx mdrun -notunepme -nb cpu >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in nbnxn-vdw-potential-switch/nb-cpu for nbnxn-vdw-potential-switch-nb-cpu
Re-running nbnxn-vdw-potential-switch using CPU-based update
Re-running nbnxn-vdw-potential-switch-argon using only CPU-based non-bonded kernels
Re-running nbnxn-vdw-potential-switch-argon using CPU-based update
Re-running nbnxn_pme using only CPU-based non-bonded kernels

Abnormal return value for ’ gmx mdrun -notunepme -nb cpu >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in nbnxn_pme/nb-cpu for nbnxn_pme-nb-cpu
Re-running nbnxn_pme using CPU-based update
Re-running nbnxn_pme_order5 using only CPU-based non-bonded kernels

Abnormal return value for ’ gmx mdrun -notunepme -nb cpu >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in nbnxn_pme_order5/nb-cpu for nbnxn_pme_order5-nb-cpu
Re-running nbnxn_pme_order5 using CPU-based update
Re-running nbnxn_pme_order6 using only CPU-based non-bonded kernels

Abnormal return value for ’ gmx mdrun -notunepme -nb cpu >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in nbnxn_pme_order6/nb-cpu for nbnxn_pme_order6-nb-cpu
Re-running nbnxn_pme_order6 using CPU-based update
Re-running nbnxn_rf using only CPU-based non-bonded kernels

Abnormal return value for ’ gmx mdrun -notunepme -nb cpu >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in nbnxn_rf/nb-cpu for nbnxn_rf-nb-cpu
Re-running nbnxn_rf using CPU-based update
Re-running nbnxn_rzero using only CPU-based non-bonded kernels
Re-running nbnxn_rzero using CPU-based update
Re-running nbnxn_vsite using only CPU-based non-bonded kernels
Re-running nst_mismatch using only CPU-based non-bonded kernels
Re-running nst_mismatch using CPU-based update
Re-running octahedron using only CPU-based non-bonded kernels

Abnormal return value for ’ gmx mdrun -notunepme -nb cpu >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in octahedron/nb-cpu for octahedron-nb-cpu
Re-running octahedron using CPU-based update
Re-running orientation-restraints using only CPU-based non-bonded kernels
Re-running position-restraints using only CPU-based non-bonded kernels
Re-running position-restraints using CPU-based update
Re-running pr-vrescale using only CPU-based non-bonded kernels

Abnormal return value for ’ gmx mdrun -notunepme -nb cpu >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in pr-vrescale/nb-cpu for pr-vrescale-nb-cpu
Re-running pr-vrescale using CPU-based update
Re-running pull_constraint using only CPU-based non-bonded kernels
Re-running pull_cylinder using only CPU-based non-bonded kernels

Abnormal return value for ’ gmx mdrun -notunepme -nb cpu >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in pull_cylinder/nb-cpu for pull_cylinder-nb-cpu
Re-running pull_cylinder using CPU-based update
Re-running pull_geometry_angle using only CPU-based non-bonded kernels
Re-running pull_geometry_angle using CPU-based update
Re-running pull_geometry_angle-axis using only CPU-based non-bonded kernels
Re-running pull_geometry_angle-axis using CPU-based update
Re-running pull_geometry_dihedral using only CPU-based non-bonded kernels
Re-running pull_geometry_dihedral using CPU-based update
Re-running sw using only CPU-based non-bonded kernels
Re-running swap_x using only CPU-based non-bonded kernels

Abnormal return value for ’ gmx mdrun -notunepme -nb cpu >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in swap_x/nb-cpu for swap_x-nb-cpu
Re-running swap_y using only CPU-based non-bonded kernels

Abnormal return value for ’ gmx mdrun -notunepme -nb cpu >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in swap_y/nb-cpu for swap_y-nb-cpu
Re-running swap_z using only CPU-based non-bonded kernels

Abnormal return value for ’ gmx mdrun -notunepme -nb cpu >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in swap_z/nb-cpu for swap_z-nb-cpu
Re-running tip4p using only CPU-based non-bonded kernels
Re-running tip4pflex using only CPU-based non-bonded kernels

Abnormal return value for ’ gmx mdrun -notunepme -nb cpu >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in tip4pflex/nb-cpu for tip4pflex-nb-cpu
Re-running urea using only CPU-based non-bonded kernels
Mdrun cannot use the requested (or automatic) number of ranks, retrying with 8.

Abnormal return value for ’ gmx mdrun -notunepme -nb cpu >mdrun.out 2>&1’ was 1
Retrying mdrun with better settings…
Re-running urea using CPU-based update
Re-running walls using only CPU-based non-bonded kernels
Re-running walls using CPU-based update
15 out of 115 complex tests FAILED

  Start 94: regressiontests/essentialdynamics

94/94 Test #94: regressiontests/essentialdynamics … Passed 25.71 sec

99% tests passed, 1 tests failed out of 94

Label Time Summary:
GTest = 662.01 secproc (90 tests)
IntegrationTest = 357.41 sec
proc (29 tests)
MpiTest = 452.11 secproc (21 tests)
QuickGpuTest = 142.36 sec
proc (23 tests)
SlowGpuTest = 902.36 secproc (16 tests)
SlowTest = 288.30 sec
proc (14 tests)
UnitTest = 16.29 sec*proc (47 tests)

Total Test time (real) = 760.76 sec

The following tests FAILED:
93 - regressiontests/complex (Failed)
Errors while running CTest
make[3]: *** [CMakeFiles/run-ctest-nophys.dir/build.make:71: CMakeFiles/run-ctest-nophys] Error 8
make[2]: *** [CMakeFiles/Makefile2:3383: CMakeFiles/run-ctest-nophys.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:3416: CMakeFiles/check.dir/rule] Error 2
make: *** [Makefile:628: check] Error 2

I don’t fully understand this output. Maybe the tests fail because you are asking for too many ranks (but then our tests should handle this better). @al42and do you have an idea what is going wrong here?

Anyhow, I think the issue is with the test setup, your GROMACS installation should be fine.

@Marie, can you attach the files mentioned in the error messages?

E.g., for

FAILED. Check mdrun.out, md.log file(s) in pull_cylinder/nb-cpu for pull_cylinder-nb-cpu,

that would be complex/pull_cylinder/nb-cpu/mdrun.out and complex/pull_cylinder/nb-cpu/md.log

md.log (19.6 KB) - pull cylinder

mdrun.txt (2.4 KB) - pull cylinder

These are the files for 1 of the error messages, I can attach more if needed.

Thank you for looking into this!

Thanks.

You can check if the errors in other files are any different, but what you attached confirms Berk’s suspicion:

Fatal error:
The number of ranks selected for particle-particle work (22) contains a large
prime factor 11. In most cases this will lead to bad performance. Choose a
number with smaller prime factors or set the decomposition (option -dd)
manually.

@Marie: this error should be safe to ignore. You can also try adding -DREGRESSIONTEST_EXTRA_ARGS='-nt;2' to your CMake command; then run make check again; it should now run with fixed configuration, avoiding this error.

@hess: CPUs with 22 threads are fun. We have 6 P-cores (with 2 threads each), 8 E-cores and 2 LPE-cores (with 1 thread each). Yet GROMACS incorrectly reports them as 11 cores with 2 threads each. Package 0: [ 0 1] [ 2 3] [ 4 5] [ 6 7] [ 8 9] [ 10 11] [ 12 13] [ 14 15] [ 16 17] [ 18 19] [ 20 21].

@Marie: To help us improve hardware detection for new Intel CPUs, could you run the following command in your GROMACS source root directory?

./src/gromacs/hardware/tests/capture-topology.sh CoreUltra9185H

This script will create a directory named CoreUltra9185H containing hardware topology information for your CPU. Please archive this directory (e.g., using tar -czvf CoreUltra9185H.tar.gz CoreUltra9185H) and share the resulting file with us.

I checked the other files and they all contain the same fatal error.

Good to know that it shouldn’t affect the GROMACS installation

Link for CoreUltra9185H.tar

Thank you for your help!

1 Like