Gromacs 2021.1 installation, make check fails at Mdrun Mpi Coordination Tests and regression tests

GROMACS version: 2021.1
GROMACS modification: No

Hi, could I get some advise on instalation please?

I am installing gromacs 2021 on the local HPC. I have logged into an interactive node with GPUs to stay off the way.
The node info:
num_proc=20
gputype=TITAN-X
gpus=4

I am using (via loded modules):

  • cmake 3.18.0
  • gcc 8.4.0
  • openmpi 4.0.0
  • cuda 10.1.105
  • python 3.7.3

My flags are:
cmake ../ -DCMAKE_C_COMPILER=gcc -DGMX_GPU=CUDA -DCMAKE_INSTALL_PREFIX=$prefix -DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=ON -DBUILD_SHARED_LIBS=yes

when i run make check I get the following failed/timed out tests:

 63 - MdrunMpiCoordinationTestsTwoRanks (Timeout)
 68 - regressiontests/complex (Failed)
 69 - regressiontests/freeenergy (Timeout)
 71 - regressiontests/essentialdynamics (Failed)

I am not sure what these test actually mean - please advise

  • Have I done something wrong?
  • Are these failuars something I should expect if doing tests of a single node? Should I be testing somehow else?

Thank you!

V

No.

No.

Can you please:

  • share the entire output (all errors the complex and essentialdynamics tests)
  • try to run with a single GPU, some of the test scripts can run into hiccups on multi-GPU nodes; e.g. run CUDA_VISIBLE_DEVICES=0 make check

Cheers,
Szilárd

Thank you Szilárd.

Below is the last part of the make check, from where it times out at mdrun mpi coordinatioon tests two ranks. It is fine for one rank.

As you sugested, I am also now trying CUDA_VISIBLE_DEVICES=0 make check

Thanks again!
V

Using 2 MPI threads

Non-default thread affinity set, disabling internal thread affinity

Using 1 OpenMP thread per tMPI thread

starting mdrun ‘Argon’
16 steps, 0.0 ps.
Generated 1 of the 1 non-bonded parameter combinations

Excluding 1 bonded neighbours molecule type ‘Argon’

Determining Verlet buffer for a tolerance of 1e-06 kJ/mol/ps at 80 K

Calculated rlist for 1x1 atom pair-list as 0.701 nm, buffer size 0.001 nm

Set rlist, assuming 4x4 atom pair-list, to 0.701 nm, buffer size 0.001 nm

Note that mdrun will redetermine rlist based on the actual pair-list setup

This run will generate roughly 0 Mb of data

Writing final coordinates.

NOTE: 48 % of the run time was spent communicating energies,
you might want to increase some nst* mdp options

           Core t (s)   Wall t (s)        (%)
   Time:        3.701        1.883      196.6
             (ns/day)    (hour/ns)

Performance: 0.780 30.767
Opened /home/verastov/group/SOFTWARE/gromacs-2021.1/build/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithCoupling_PeriodicActionsTest_PeriodicActionsAgreeWithReference_21_reference.edr as single precision energy file
Opened /home/verastov/group/SOFTWARE/gromacs-2021.1/build/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithCoupling_PeriodicActionsTest_PeriodicActionsAgreeWithReference_21.edr as single precision energy file

NOTE 1 [file /home/verastov/group/SOFTWARE/gromacs-2021.1/build/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithCoupling_PeriodicActionsTest_PeriodicActionsAgreeWithReference_21_input.mdp]:
With Verlet lists the optimal nstlist is >= 10, with GPUs >= 20. Note
that with the Verlet scheme, nstlist has no effect on the accuracy of
your simulation.

NOTE 2 [file /home/verastov/group/SOFTWARE/gromacs-2021.1/build/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithCoupling_PeriodicActionsTest_PeriodicActionsAgreeWithReference_21_input.mdp]:
Setting nstcalcenergy (100) equal to nstenergy (4)

Number of degrees of freedom in T-Coupling group System is 33.00

There were 2 notes
Reading file /home/verastov/group/SOFTWARE/gromacs-2021.1/build/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithCoupling_PeriodicActionsTest_PeriodicActionsAgreeWithReference_21.tpr, VERSION 2021.1 (single precision)
Changing nstlist from 8 to 100, rlist from 0.701 to 0.734

Using 2 MPI threads

Non-default thread affinity set, disabling internal thread affinity

Using 1 OpenMP thread per tMPI thread

starting mdrun ‘Argon’
16 steps, 0.0 ps.
Generated 1 of the 1 non-bonded parameter combinations

Excluding 1 bonded neighbours molecule type ‘Argon’

Determining Verlet buffer for a tolerance of 1e-06 kJ/mol/ps at 80 K

Calculated rlist for 1x1 atom pair-list as 0.701 nm, buffer size 0.001 nm

Set rlist, assuming 4x4 atom pair-list, to 0.701 nm, buffer size 0.001 nm

Note that mdrun will redetermine rlist based on the actual pair-list setup

This run will generate roughly 0 Mb of data

Writing final coordinates.

NOTE: 48 % of the run time was spent communicating energies,
you might want to increase some nst* mdp options

           Core t (s)   Wall t (s)        (%)
   Time:        3.751        1.908      196.6
             (ns/day)    (hour/ns)

Performance: 0.770 31.176
Opened /home/verastov/group/SOFTWARE/gromacs-2021.1/build/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithCoupling_PeriodicActionsTest_PeriodicActionsAgreeWithReference_21_reference.edr as single precision energy file
Opened /home/verastov/group/SOFTWARE/gromacs-2021.1/build/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithCoupling_PeriodicActionsTest_PeriodicActionsAgreeWithReference_21.edr as single precision energy file

NOTE 1 [file /home/verastov/group/SOFTWARE/gromacs-2021.1/build/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithCoupling_PeriodicActionsTest_PeriodicActionsAgreeWithReference_21_input.mdp]:
With Verlet lists the optimal nstlist is >= 10, with GPUs >= 20. Note
that with the Verlet scheme, nstlist has no effect on the accuracy of
your simulation.

NOTE 2 [file /home/verastov/group/SOFTWARE/gromacs-2021.1/build/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithCoupling_PeriodicActionsTest_PeriodicActionsAgreeWithReference_21_input.mdp]:
Setting nstcalcenergy (100) equal to nstenergy (4)

Number of degrees of freedom in T-Coupling group System is 33.00

There were 2 notes
Reading file /home/verastov/group/SOFTWARE/gromacs-2021.1/build/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithCoupling_PeriodicActionsTest_PeriodicActionsAgreeWithReference_21.tpr, VERSION 2021.1 (single precision)
Changing nstlist from 8 to 100, rlist from 0.701 to 0.734

Using 2 MPI threads

Non-default thread affinity set, disabling internal thread affinity

Using 1 OpenMP thread per tMPI thread

starting mdrun ‘Argon’
16 steps, 0.0 ps.
Generated 1 of the 1 non-bonded parameter combinations

Excluding 1 bonded neighbours molecule type ‘Argon’

Determining Verlet buffer for a tolerance of 1e-06 kJ/mol/ps at 80 K

Calculated rlist for 1x1 atom pair-list as 0.701 nm, buffer size 0.001 nm

Set rlist, assuming 4x4 atom pair-list, to 0.701 nm, buffer size 0.001 nm

Note that mdrun will redetermine rlist based on the actual pair-list setup

This run will generate roughly 0 Mb of data

Writing final coordinates.

NOTE: 48 % of the run time was spent communicating energies,
you might want to increase some nst* mdp options

           Core t (s)   Wall t (s)        (%)
   Time:        3.699        1.882      196.6
             (ns/day)    (hour/ns)

Performance: 0.780 30.751
Opened /home/verastov/group/SOFTWARE/gromacs-2021.1/build/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithCoupling_PeriodicActionsTest_PeriodicActionsAgreeWithReference_21_reference.edr as single precision energy file
Opened /home/verastov/group/SOFTWARE/gromacs-2021.1/build/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithCoupling_PeriodicActionsTest_PeriodicActionsAgreeWithReference_21.edr as single precision energy file
[ OK ] PropagatorsWithCoupling/PeriodicActionsTest.PeriodicActionsAgreeWithReference/21 (22091 ms)
[ RUN ] PropagatorsWithCoupling/PeriodicActionsTest.PeriodicActionsAgreeWithReference/22

NOTE 1 [file /home/verastov/group/SOFTWARE/gromacs-2021.1/build/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithCoupling_PeriodicActionsTest_PeriodicActionsAgreeWithReference_22_input.mdp]:
With Verlet lists the optimal nstlist is >= 10, with GPUs >= 20. Note
that with the Verlet scheme, nstlist has no effect on the accuracy of
your simulation.

NOTE 2 [file /home/verastov/group/SOFTWARE/gromacs-2021.1/build/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithCoupling_PeriodicActionsTest_PeriodicActionsAgreeWithReference_22_input.mdp]:
Setting nstcalcenergy (100) equal to nstenergy (4)

Number of degrees of freedom in T-Coupling group System is 33.00

There were 2 notes
Reading file /home/verastov/group/SOFTWARE/gromacs-2021.1/build/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithCoupling_PeriodicActionsTest_PeriodicActionsAgreeWithReference_22.tpr, VERSION 2021.1 (single precision)
Changing nstlist from 8 to 100, rlist from 0.701 to 0.734

Using 2 MPI threads

Non-default thread affinity set, disabling internal thread affinity

Using 1 OpenMP thread per tMPI thread

starting mdrun ‘Argon’
16 steps, 0.0 ps.
Generated 1 of the 1 non-bonded parameter combinations

Excluding 1 bonded neighbours molecule type ‘Argon’

Determining Verlet buffer for a tolerance of 1e-06 kJ/mol/ps at 80 K

Calculated rlist for 1x1 atom pair-list as 0.701 nm, buffer size 0.001 nm

Set rlist, assuming 4x4 atom pair-list, to 0.701 nm, buffer size 0.001 nm

Note that mdrun will redetermine rlist based on the actual pair-list setup

This run will generate roughly 0 Mb of data

Writing final coordinates.

NOTE: 48 % of the run time was spent communicating energies,
you might want to increase some nst* mdp options

           Core t (s)   Wall t (s)        (%)
   Time:        3.715        1.899      195.6
             (ns/day)    (hour/ns)

Performance: 0.773 31.029

NOTE 1 [file /home/verastov/group/SOFTWARE/gromacs-2021.1/build/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithCoupling_PeriodicActionsTest_PeriodicActionsAgreeWithReference_22_input.mdp]:
With Verlet lists the optimal nstlist is >= 10, with GPUs >= 20. Note
that with the Verlet scheme, nstlist has no effect on the accuracy of
your simulation.

NOTE 2 [file /home/verastov/group/SOFTWARE/gromacs-2021.1/build/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithCoupling_PeriodicActionsTest_PeriodicActionsAgreeWithReference_22_input.mdp]:
Setting nstcalcenergy (100) equal to nstenergy (4)

Number of degrees of freedom in T-Coupling group System is 33.00

There were 2 notes
Reading file /home/verastov/group/SOFTWARE/gromacs-2021.1/build/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithCoupling_PeriodicActionsTest_PeriodicActionsAgreeWithReference_22.tpr, VERSION 2021.1 (single precision)
Changing nstlist from 8 to 100, rlist from 0.701 to 0.734

Using 2 MPI threads

Non-default thread affinity set, disabling internal thread affinity

Using 1 OpenMP thread per tMPI thread

starting mdrun ‘Argon’
16 steps, 0.0 ps.
Generated 1 of the 1 non-bonded parameter combinations

Excluding 1 bonded neighbours molecule type ‘Argon’

Determining Verlet buffer for a tolerance of 1e-06 kJ/mol/ps at 80 K

Calculated rlist for 1x1 atom pair-list as 0.701 nm, buffer size 0.001 nm

Set rlist, assuming 4x4 atom pair-list, to 0.701 nm, buffer size 0.001 nm

Note that mdrun will redetermine rlist based on the actual pair-list setup

This run will generate roughly 0 Mb of data

Writing final coordinates.

NOTE: 47 % of the run time was spent communicating energies,
you might want to increase some nst* mdp options

           Core t (s)   Wall t (s)        (%)
   Time:        3.708        1.886      196.6
             (ns/day)    (hour/ns)

Performance: 0.779 30.824
Opened /home/verastov/group/SOFTWARE/gromacs-2021.1/build/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithCoupling_PeriodicActionsTest_PeriodicActionsAgreeWithReference_22_reference.edr as single precision energy file
Opened /home/verastov/group/SOFTWARE/gromacs-2021.1/build/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithCoupling_PeriodicActionsTest_PeriodicActionsAgreeWithReference_22.edr as single precision energy file
Last energy frame read 4 time 0.016
NOTE 1 [file /home/verastov/group/SOFTWARE/gromacs-2021.1/build/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithCoupling_PeriodicActionsTest_PeriodicActionsAgreeWithReference_22_input.mdp]:
With Verlet lists the optimal nstlist is >= 10, with GPUs >= 20. Note
that with the Verlet scheme, nstlist has no effect on the accuracy of
your simulation.

NOTE 2 [file /home/verastov/group/SOFTWARE/gromacs-2021.1/build/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithCoupling_PeriodicActionsTest_PeriodicActionsAgreeWithReference_22_input.mdp]:
Setting nstcalcenergy (100) equal to nstenergy (4)

Number of degrees of freedom in T-Coupling group System is 33.00

There were 2 notes
Reading file /home/verastov/group/SOFTWARE/gromacs-2021.1/build/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithCoupling_PeriodicActionsTest_PeriodicActionsAgreeWithReference_22.tpr, VERSION 2021.1 (single precision)
Changing nstlist from 8 to 100, rlist from 0.701 to 0.734

Using 2 MPI threads

Non-default thread affinity set, disabling internal thread affinity

Using 1 OpenMP thread per tMPI thread

starting mdrun ‘Argon’
16 steps, 0.0 ps.
Generated 1 of the 1 non-bonded parameter combinations

Excluding 1 bonded neighbours molecule type ‘Argon’

Determining Verlet buffer for a tolerance of 1e-06 kJ/mol/ps at 80 K

Calculated rlist for 1x1 atom pair-list as 0.701 nm, buffer size 0.001 nm

Set rlist, assuming 4x4 atom pair-list, to 0.701 nm, buffer size 0.001 nm

Note that mdrun will redetermine rlist based on the actual pair-list setup

This run will generate roughly 0 Mb of data

Writing final coordinates.

NOTE: 48 % of the run time was spent communicating energies,
you might want to increase some nst* mdp options

           Core t (s)   Wall t (s)        (%)
   Time:        3.777        1.921      196.6
             (ns/day)    (hour/ns)

Performance: 0.765 31.388
Opened /home/verastov/group/SOFTWARE/gromacs-2021.1/build/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithCoupling_PeriodicActionsTest_PeriodicActionsAgreeWithReference_22_reference.edr as single precision energy file
Opened /home/verastov/group/SOFTWARE/gromacs-2021.1/build/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithCoupling_PeriodicActionsTest_PeriodicActionsAgreeWithReference_22.edr as single precision energy file
Last energy frame read 4 time 0.016
NOTE 1 [file /home/verastov/group/SOFTWARE/gromacs-2021.1/build/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithCoupling_PeriodicActionsTest_PeriodicActionsAgreeWithReference_22_input.mdp]:
With Verlet lists the optimal nstlist is >= 10, with GPUs >= 20. Note
that with the Verlet scheme, nstlist has no effect on the accuracy of
your simulation.

NOTE 2 [file /home/verastov/group/SOFTWARE/gromacs-2021.1/build/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithCoupling_PeriodicActionsTest_PeriodicActionsAgreeWithReference_22_input.mdp]:
Setting nstcalcenergy (100) equal to nstenergy (4)

Number of degrees of freedom in T-Coupling group System is 33.00

There were 2 notes
Reading file /home/verastov/group/SOFTWARE/gromacs-2021.1/build/src/programs/mdrun/tests/Testing/Temporary/PropagatorsWithCoupling_PeriodicActionsTest_PeriodicActionsAgreeWithReference_22.tpr, VERSION 2021.1 (single precision)
Changing nstlist from 8 to 100, rlist from 0.701 to 0.734

Using 2 MPI threads

Non-default thread affinity set, disabling internal thread affinity

Using 1 OpenMP thread per tMPI thread

  Start 64: MdrunFEPTests

64/71 Test #64: MdrunFEPTests … Passed 14.10 sec
Start 65: MdrunSimulatorComparison
65/71 Test #65: MdrunSimulatorComparison … Passed 0.04 sec
Start 66: GmxapiExternalInterfaceTests
66/71 Test #66: GmxapiExternalInterfaceTests … Passed 14.25 sec
Start 67: GmxapiInternalInterfaceTests
67/71 Test #67: GmxapiInternalInterfaceTests … Passed 12.77 sec
Start 68: regressiontests/complex
68/71 Test #68: regressiontests/complex …***Timeout 1500.09 sec
sh: line 1: 196107 Aborted (core dumped) gmx mdrun -notunepme > mdrun.out 2>&1

Abnormal return value for ’ gmx mdrun -notunepme >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in nbnxn-ljpme-geometric for nbnxn-ljpme-geometric
sh: line 1: 196198 Aborted (core dumped) gmx mdrun -notunepme > mdrun.out 2>&1

Abnormal return value for ’ gmx mdrun -notunepme >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in nbnxn-ljpme-LB for nbnxn-ljpme-LB
sh: line 1: 196326 Aborted (core dumped) gmx mdrun -notunepme > mdrun.out 2>&1

Abnormal return value for ’ gmx mdrun -notunepme >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in nbnxn-vdw-force-switch for nbnxn-vdw-force-switch
sh: line 1: 1000 Aborted (core dumped) gmx mdrun -notunepme > mdrun.out 2>&1

Abnormal return value for ’ gmx mdrun -notunepme >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in nbnxn_pme_order6 for nbnxn_pme_order6

Abnormal return value for ’ gmx mdrun -notunepme >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in octahedron for octahedron
sh: line 1: 2027 Aborted (core dumped) gmx mdrun -notunepme > mdrun.out 2>&1

Abnormal return value for ’ gmx mdrun -notunepme >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in pr-vrescale for pr-vrescale

Abnormal return value for ’ gmx mdrun -notunepme >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in swap_x for swap_x

Abnormal return value for ’ gmx mdrun -notunepme >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in swap_y for swap_y

Abnormal return value for ’ gmx mdrun -notunepme >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in swap_z for swap_z

  Start 69: regressiontests/freeenergy

69/71 Test #69: regressiontests/freeenergy …***Failed 471.37 sec

Abnormal return value for ’ gmx mdrun -notunepme >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in coulandvdwsequential_coul for coulandvdwsequential_coul

Abnormal return value for ’ gmx mdrun -notunepme >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in coulandvdwsequential_vdw for coulandvdwsequential_vdw

Abnormal return value for ’ gmx mdrun -notunepme >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in coulandvdwtogether for coulandvdwtogether
Mdrun cannot use the requested (or automatic) number of ranks, retrying with 8.

Abnormal return value for ’ gmx mdrun -notunepme >mdrun.out 2>&1’ was 1
Retrying mdrun with better settings…
Mdrun cannot use the requested (or automatic) number of ranks, retrying with 8.

Abnormal return value for ’ gmx mdrun -notunepme >mdrun.out 2>&1’ was 1
Retrying mdrun with better settings…
Mdrun cannot use the requested (or automatic) number of ranks, retrying with 8.

Abnormal return value for ’ gmx mdrun -notunepme >mdrun.out 2>&1’ was 1
Retrying mdrun with better settings…

Abnormal return value for ’ gmx mdrun -notunepme >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in restraints for restraints

Abnormal return value for ’ gmx mdrun -notunepme >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in simtemp for simtemp

Abnormal return value for ’ gmx mdrun -notunepme >mdrun.out 2>&1’ was -1
FAILED. Check mdrun.out, md.log file(s) in transformAtoB for transformAtoB
Mdrun cannot use the requested (or automatic) number of ranks, retrying with 8.

Abnormal return value for ’ gmx mdrun -notunepme >mdrun.out 2>&1’ was 1
Retrying mdrun with better settings…
6 out of 10 freeenergy tests FAILED

  Start 70: regressiontests/rotation

70/71 Test #70: regressiontests/rotation … Passed 152.30 sec
Start 71: regressiontests/essentialdynamics
71/71 Test #71: regressiontests/essentialdynamics …***Failed 90.87 sec

Abnormal return value for ’ gmx mdrun -ei /home/verastov/group/SOFTWARE/gromacs-2021.1/build/tests/regressiontests-2021.1/essentialdynamics/flooding1/sam.edi -eo flooding1.xvg >mdrun.out 2>&1’ was -1

Abnormal return value for ’ gmx mdrun -ei /home/verastov/group/SOFTWARE/gromacs-2021.1/build/tests/regressiontests-2021.1/essentialdynamics/flooding2/sam.edi -eo flooding2.xvg >mdrun.out 2>&1’ was -1
Essential dynamics tests FAILED with 2 errors!

94% tests passed, 4 tests failed out of 71

Label Time Summary:
GTest = 856.49 secproc (65 tests)
IntegrationTest = 242.72 sec
proc (18 tests)
MpiTest = 633.83 secproc (8 tests)
SlowTest = 540.68 sec
proc (8 tests)
UnitTest = 73.09 sec*proc (39 tests)

Total Test time (real) = 3072.04 sec

The following tests FAILED:
63 - MdrunMpiCoordinationTestsTwoRanks (Timeout)
68 - regressiontests/complex (Timeout)
69 - regressiontests/freeenergy (Failed)
71 - regressiontests/essentialdynamics (Failed)
Errors while running CTest
make[3]: *** [CMakeFiles/run-ctest-nophys] Error 8
make[2]: *** [CMakeFiles/run-ctest-nophys.dir/all] Error 2
make[1]: *** [CMakeFiles/check.dir/rule] Error 2
make: *** [check] Error 2

Thank you Szilárd!

Also done CUDA_VISIBLE_DEVICES=0 make check and same as on multi-GPU (see below)

Do you have any insights on what is happening or what am I missing?

Thank you again
V

Abnormal return value for ' gmx mdrun        -notunepme >mdrun.out 2>&1' was -1
FAILED. Check mdrun.out, md.log file(s) in simtemp for simtemp
sh: line 1: 18452 Aborted                 (core dumped) gmx mdrun -notunepme > mdrun.out 2>&1

Abnormal return value for ' gmx mdrun        -notunepme >mdrun.out 2>&1' was -1
FAILED. Check mdrun.out, md.log file(s) in transformAtoB for transformAtoB
Mdrun cannot use the requested (or automatic) number of ranks, retrying with 8.

Abnormal return value for ' gmx mdrun        -notunepme >mdrun.out 2>&1' was 1
Retrying mdrun with better settings...
6 out of 10 freeenergy tests FAILED

      Start 70: regressiontests/rotation
70/71 Test #70: regressiontests/rotation ..............   Passed  149.20 sec
      Start 71: regressiontests/essentialdynamics
71/71 Test #71: regressiontests/essentialdynamics .....***Failed   89.07 sec

Abnormal return value for ' gmx mdrun       -ei /home/verastov/group/SOFTWARE/gromacs-2021.1/build/tests/regressiontests-2021.1/essentialdynamics/flooding1/sam.edi -eo flooding1.xvg >mdrun.out 2>&1' was -1

Abnormal return value for ' gmx mdrun       -ei /home/verastov/group/SOFTWARE/gromacs-2021.1/build/tests/regressiontests-2021.1/essentialdynamics/flooding2/sam.edi -eo flooding2.xvg >mdrun.out 2>&1' was -1
Essential dynamics tests FAILED with 2 errors!


94% tests passed, 4 tests failed out of 71

Label Time Summary:
GTest              = 855.12 sec*proc (65 tests)
IntegrationTest    = 241.89 sec*proc (18 tests)
MpiTest            = 633.67 sec*proc (8 tests)
SlowTest           = 542.76 sec*proc (8 tests)
UnitTest           =  70.46 sec*proc (39 tests)

Total Test time (real) = 3063.27 sec

The following tests FAILED:
	 63 - MdrunMpiCoordinationTestsTwoRanks (Timeout)
	 68 - regressiontests/complex (Timeout)
	 69 - regressiontests/freeenergy (Failed)
	 71 - regressiontests/essentialdynamics (Failed)
Errors while running CTest
make[3]: *** [CMakeFiles/run-ctest-nophys] Error 8
make[2]: *** [CMakeFiles/run-ctest-nophys.dir/all] Error 2
make[1]: *** [CMakeFiles/check.dir/rule] Error 2
make: *** [check] Error 2

Hi Valentina,

Have you tried to compile with MPI? On my computer it passed 100% tests. You would need the following flags for cmake: -DGMX_MPI=ON -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx

Hi,

Not yet, trying but atm have this error:

CMake Error in /home/verastov/group/SOFTWARE/gromacs-2021.1/build-mpi/CMakeFiles/CMakeTmp/CMakeLists.txt:
  Target "cmTC_9d91c" requires the language dialect "CXX17" , but CMake does
  not know the compile flags to use to enable it.

I am using: parallel_studio_xe_2017_update4/compilers_and_libraries/linux/mpi/intel64/bin/mpicxx and mpicc

There are at least two issues here, neither of which is expected (timeouts and failing tests); let’s try to diagnose them I suggest.

First, please try to run GMX_DISABLE_GPU_DETECTION=1 make check which will test your GROMACS build only using a CPU, I assume this will resolve at least the timeout (or both issues).

I am not sure why are you getting timeouts with the GPU-enabled test, is there anything else running on the machine?

Last, can you please attach the output files of the failing tests?

Thank you,

trying GMX_DISABLE_GPU_DETECTION=1 make check

Where are the errors written out to?

There may be stuff running on the node, I am not sure I can get whole node to myself TBH, will check. Still, independent on what node I get and resources I get - same errors.

Hi,

Same w/o GPUs.

...
Abnormal return value for ' gmx mdrun        -notunepme >mdrun.out 2>&1' was 1
Retrying mdrun with better settings...

Abnormal return value for ' gmx mdrun        -notunepme >mdrun.out 2>&1' was -1
FAILED. Check mdrun.out, md.log file(s) in restraints for restraints

Abnormal return value for ' gmx mdrun        -notunepme >mdrun.out 2>&1' was -1
FAILED. Check mdrun.out, md.log file(s) in simtemp for simtemp
sh: line 1: 173807 Aborted                 (core dumped) gmx mdrun -notunepme > mdrun.out 2>&1

Abnormal return value for ' gmx mdrun        -notunepme >mdrun.out 2>&1' was -1
FAILED. Check mdrun.out, md.log file(s) in transformAtoB for transformAtoB
Mdrun cannot use the requested (or automatic) number of ranks, retrying with 8.

Abnormal return value for ' gmx mdrun        -notunepme >mdrun.out 2>&1' was 1
Retrying mdrun with better settings...
6 out of 10 freeenergy tests FAILED

      Start 70: regressiontests/rotation
70/71 Test #70: regressiontests/rotation ..............   Passed   24.38 sec
      Start 71: regressiontests/essentialdynamics
71/71 Test #71: regressiontests/essentialdynamics .....***Failed   16.96 sec

Abnormal return value for ' gmx mdrun       -ei /home/verastov/group/SOFTWARE/gromacs-2021.1/build/tests/regressiontests-2021.1/essentialdynamics/flooding1/sam.edi -eo flooding1.xvg >mdrun.out 2>&1' was -1

Abnormal return value for ' gmx mdrun       -ei /home/verastov/group/SOFTWARE/gromacs-2021.1/build/tests/regressiontests-2021.1/essentialdynamics/flooding2/sam.edi -eo flooding2.xvg >mdrun.out 2>&1' was -1
Essential dynamics tests FAILED with 2 errors!


94% tests passed, 4 tests failed out of 71

Label Time Summary:
GTest              = 660.78 sec*proc (65 tests)
IntegrationTest    = 119.39 sec*proc (18 tests)
MpiTest            = 592.61 sec*proc (8 tests)
SlowTest           = 530.32 sec*proc (8 tests)
UnitTest           =  11.07 sec*proc (39 tests)

Total Test time (real) = 2179.85 sec

The following tests FAILED:
	 63 - MdrunMpiCoordinationTestsTwoRanks (Timeout)
	 68 - regressiontests/complex (Failed)
	 69 - regressiontests/freeenergy (Failed)
	 71 - regressiontests/essentialdynamics (Failed)
Errors while running CTest
make[3]: *** [CMakeFiles/run-ctest-nophys] Error 8
make[2]: *** [CMakeFiles/run-ctest-nophys.dir/all] Error 2
make[1]: *** [CMakeFiles/check.dir/rule] Error 2
make: *** [check] Error 2

In your build directory where you run make check from, in tests/regressiontests-2021. From there, grab the freeenergy and essentialdynamics directories and please file an issue on Issues · GROMACS / GROMACS · GitLab

The timeouts are most likely related to the resources you are trying to run the tests on being oversubscribed, e.g. the CPU cores or GPUs assigned to make check already running something else which makes some of the slower test not complete in time. If your CPU-only tests are not timing out, it is more likely that you are running on GPUs which are already busy.

Hi, it seems I need a new[er than 2019] intel on the HPC, so getting IT to compile it, then will retry. Thank you for replying. I will update when I try with the new compilers/libraries.

I suggest trying GNU gcc, typically there is no advantage to using the Intel compilers over GNU.

Hi! Thank you for continuous advise!

We had some updates on the HPC, so only now got again onto the install.
I only have gcc working. which should be OK.

I have logged into a node to install, got 6 GPUs and 6CPUs on it.

I am doing make check and the following fail:

97% tests passed, 2 tests failed out of 71

Label Time Summary:
GTest              = 1190.84 sec*proc (65 tests)
IntegrationTest    = 511.62 sec*proc (18 tests)
MpiTest            = 640.24 sec*proc (8 tests)
SlowTest           = 575.70 sec*proc (8 tests)
UnitTest           = 103.51 sec*proc (39 tests)

Total Test time (real) = 3050.20 sec

The following tests FAILED:
	 55 - MdrunIOTests (Timeout)
	 71 - regressiontests/essentialdynamics (Failed)
Errors while running CTest
make[3]: *** [CMakeFiles/run-ctest-nophys] Error 8
make[2]: *** [CMakeFiles/run-ctest-nophys.dir/all] Error 2
make[1]: *** [CMakeFiles/check.dir/rule] Error 2
make: *** [check] Error 2

I am not sure what sis with the timeout one.

and the regression is:

Start 71: regressiontests/essentialdynamics
71/71 Test #71: regressiontests/essentialdynamics …***Failed 185.82 sec
Mdrun cannot use the requested (or automatic) number of ranks, retrying with 8.

Abnormal return value for ’ gmx mdrun -ei /home/verastov/group/SOFTWARE/GROMACS-2021.1/install_gmx/gromacs-2021.1/build/tests/regressiontests-2021.1/essentialdynamics/flooding1/sam.edi -eo flooding1.xvg >mdrun.out 2>&1’ was 1
Retrying mdrun with better settings…

Abnormal return value for ’ gmx mdrun -ntmpi 8 -ei /home/verastov/group/SOFTWARE/GROMACS-2021.1/install_gmx/gromacs-2021.1/build/tests/regressiontests-2021.1/essentialdynamics/flooding1/sam.edi -eo flooding1.xvg >mdrun.out 2>&1’ was -1
Mdrun cannot use the requested (or automatic) number of ranks, retrying with 8.

Abnormal return value for ’ gmx mdrun -ei /home/verastov/group/SOFTWARE/GROMACS-2021.1/install_gmx/gromacs-2021.1/build/tests/regressiontests-2021.1/essentialdynamics/flooding2/sam.edi -eo flooding2.xvg >mdrun.out 2>&1’ was 1
Retrying mdrun with better settings…

Abnormal return value for ’ gmx mdrun -ntmpi 8 -ei /home/verastov/group/SOFTWARE/GROMACS-2021.1/install_gmx/gromacs-2021.1/build/tests/regressiontests-2021.1/essentialdynamics/flooding2/sam.edi -eo flooding2.xvg >mdrun.out 2>&1’ was -1
Essential dynamics tests FAILED with 2 errors!

How can I get more insights to what is going wrong?

Thank you
Valentina

This answers my question just posted above on the time out. There shouldn’t be any problem with the resources, but I am suspicious on the cluster things…

I think the regression is also related to seeing or not the resources.

I am going back to the cluster management team :(

Thank you!

Hi Valentina, I am facing the similar problem with 2023.3 with Test #74:regressiontests/complex only, 99% passed in my case. Were you able to fix the error? Please let me know. Thanks

Hi @pbhatta3,
I also encountered a failed test with gromacs 2023.3 It was regressiontests/complex, but in my case this test is #84. My machine has 8 GPUs and automatic assignment of tasks to these 8 GPUs failed. Following @pszilard suggestion CUDA_VISIBLE_DEVICES=0 make checkallowed to pass all tests.

But are the tests also failing due to timeouts or is some output not within the tolerance?

Hi @rcrehuet ,

Thank you for your suggestions. Our computer also has 8 GPUs. I will try your and @pszilard 's suggestions and see if they work in my case too.

One more thing, I was not able to build GROMACS with MPI support, as many tests failed (71% tests passed, 17 tests failed out of 58). In contrast, when building GROMACS without MPI support, only one test (regressiontests/complex) failed. Do you or @pszilard know how to fix this issue and successfully build GROMACS with MPI support? I have attached the log files for all three steps (cmake, make, make check ) and the CPU information “lscpu” of my computer below (if needed). Any suggestions will be helpful. Thanks
cmake_output2.log (16.0 KB)
make_check_output.log (1.2 MB)
make_output.log (6.1 MB)

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 96
On-line CPU(s) list: 0-95
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 2
NUMA node(s): 2
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD EPYC 7352 24-Core Processor
Stepping: 0
CPU MHz: 3148.522
CPU max MHz: 2300.0000
CPU min MHz: 1500.0000
BogoMIPS: 4599.92
Virtualization: AMD-V
L1d cache: 32K
L1i cache: 32K
L2 cache: 512K
L3 cache: 16384K
NUMA node0 CPU(s): 0-23,48-71
NUMA node1 CPU(s): 24-47,72-95

It fails with the following error:

-------------------------------------------------------
Program:     gmx mdrun, version 2023.3
Source file: src/gromacs/taskassignment/taskassignment.cpp (line 330)
Function:    static gmx::GpuTaskAssignments gmx::GpuTaskAssignmentsBuilder::build(gmx::ArrayRef<const int>, gmx::ArrayRef<const int>, const gmx_hw_info_t&, MPI_Comm, const gmx::PhysicalNodeCommunicator&, gmx::TaskTarget, gmx::TaskTarget, gmx::TaskTarget, gmx::TaskTarget, bool, bool, bool, bool)
MPI rank:    0 (out of 12)

Inconsistency in user input:
There were 12 GPU tasks found on node nodo01, but 8 GPUs were available. If
the GPUs are equivalent, then it is usually best to have a number of tasks
that is a multiple of the number of GPUs. You should reconsider your GPU task
assignment, number of ranks, or your use of the -nb, -pme, and -npme options,
perhaps after measuring the performance you can get.

For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------

So it is not a problem with tolerance, but with default task assignment to GPUs

@mabraham is this related with the known GPU/task assignment issue?

It looks like a different issue.