Lysozyme tutorial floating point exception

GROMACS version: 2020.2
GROMACS modification: No
GROMACS: gmx mdrun, version 2020.2

Hi,
I have installed GROMACS and tried to follow the lysozyme in water tutorial from J. Lemkul. Everything worked fine, all the plots looked as they should and I did not get any warning messages. I then started the production run and after a couple of seconds it stops with the error message: “Floating point exception (core dumped)”. I have copied the output below.

Is there any way to get a more descriptive error message? I have no idea what I should do to fix this…

I have also run gmx check. The output is also pasted below.

Thanks for your help!
Gregor

"

Command line:
gmx mdrun -v -deffnm md_0_1
Back Off! I just backed up md_0_1.log to ./#md_0_1.log.7#
Reading file md_0_1.tpr, VERSION 2020.2 (single precision)
Changing nstlist from 10 to 100, rlist from 1 to 1.167
On host hulk 4 GPUs selected for this run.
Mapping of GPU IDs to the 16 GPU tasks in the 16 ranks on this node:
PP:0,PP:0,PP:0,PP:0,PP:1,PP:1,PP:1,PP:1,PP:2,PP:2,PP:2,PP:2,PP:3,PP:3,PP:3,PP:3
PP tasks will do (non-perturbed) short-ranged and most bonded interactions on the GPU
PP task will update and constrain coordinates on the CPU
Using 16 MPI threads
Using 4 OpenMP threads per tMPI thread
Back Off! I just backed up md_0_1.xtc to ./#md_0_1.xtc.5#
Back Off! I just backed up md_0_1.edr to ./#md_0_1.edr.5#
NOTE: DLB will not turn on during the first phase of PME tuning
starting mdrun ‘LYSOZYME in water’
500000 steps, 1000.0 ps.

step 800: timed with pme grid 44 44 44, coulomb cutoff 1.000: 853.5 M-cycles

step 1000: timed with pme grid 40 40 40, coulomb cutoff 1.086: 861.2 M-cycles

step 1200: timed with pme grid 36 36 36, coulomb cutoff 1.207: 912.2 M-cycles

step 1400: timed with pme grid 32 32 32, coulomb cutoff 1.357: 885.5 M-cycles

step 1400: the domain decompostion limits the PME load balancing to a coulomb cut-off of 1.357

step 1600: timed with pme grid 32 32 32, coulomb cutoff 1.357: 849.6 M-cycles

step 1800: timed with pme grid 36 36 36, coulomb cutoff 1.207: 820.5 M-cycles

step 2000: timed with pme grid 40 40 40, coulomb cutoff 1.086: 835.5 M-cycles

step 2200: timed with pme grid 42 42 42, coulomb cutoff 1.034: 852.5 M-cycles

step 2400: timed with pme grid 44 44 44, coulomb cutoff 1.000: 824.4 M-cycles

step 2600: timed with pme grid 32 32 32, coulomb cutoff 1.357: 851.8 M-cycles

step 2800: timed with pme grid 36 36 36, coulomb cutoff 1.207: 845.5 M-cycles

step 3000: timed with pme grid 40 40 40, coulomb cutoff 1.086: 818.7 M-cycles

Floating point exception (core dumped)
"

"
Command line:
gmx check -f md_0_1.xtc

Checking file md_0_1.xtc
Reading frame 0 time 0.000

Atoms 33876

Precision 0.001 (nm)
Last frame 0 time 0.000

Item #frames Timestep (ps)
Step 1
Time 1
Lambda 0
Coords 1
Velocities 0
Forces 0
Box 1

"

This should not be happening and it might be suggesting an issue in the code. However, I suggest trying to us a single GPU, the system you are simulating is anyway too small to scale (e.g. mdrun -ntmpi 1 -ntomp 16 -gpu_id 0) so there is no advantage to using all four GPUs.

Thanks for your reply! That appears to work. Today I have build Gromacs again and I got the warnings pasted below during “make check”. Does this explain the problem somehow?

Cheers,
Gregor

The following tests FAILED:
12 - GpuUtilsUnitTests (Timeout)
31 - GmxPreprocessTests (Failed)
43 - MdrunTests (Timeout)
53 - regressiontests/complex (Failed)
Errors while running CTest
make[3]: *** [CMakeFiles/run-ctest-nophys.dir/build.make:78: CMakeFiles/run-ctest-nophys] Error 8
make[2]: *** [CMakeFiles/Makefile2:3399: CMakeFiles/run-ctest-nophys.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:3213: CMakeFiles/check.dir/rule] Error 2
make: *** [Makefile:607: check] Error 2

That may be related, but the other timeouts are likely not (more likely a system configuration peculiarity). Try restricting the set of GPUs available for mdrun (e.g. to a single GPU) to use using the CUDA_VISIBLE_DEVICES environment variable. Do the timeouts and error still occur?

Hi,

I have tried what you suggest and then only two tests failed (see below)

Cheers,
Gregor

96% tests passed, 2 tests failed out of 56

Label Time Summary:
GTest = 769.85 secproc (52 tests)
IntegrationTest = 297.64 sec
proc (9 tests)
MpiTest = 476.15 secproc (8 tests)
SlowTest = 419.60 sec
proc (2 tests)
UnitTest = 52.61 sec*proc (41 tests)

Total Test time (real) = 1306.18 sec

The following tests FAILED:
31 - GmxPreprocessTests (Failed)
43 - MdrunTests (Timeout)
Errors while running CTest
make[3]: *** [CMakeFiles/run-ctest-nophys.dir/build.make:78: CMakeFiles/run-ctest-nophys] Error 8
make[2]: *** [CMakeFiles/Makefile2:3399: CMakeFiles/run-ctest-nophys.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:3213: CMakeFiles/check.dir/rule] Error 2
make: *** [Makefile:607: check] Error 2