Fatal error potential energy is -nan on Power9 system

GROMACS version: 2020.2
GROMACS modification: No


I’m trying to run some simulations on POWER9 machines but they crash immediately during energy-minimization with the following output.

Fatal error:
Step 0: The total potential energy is -nan, which is not finite. The LJ and
electrostatic contributions to the energy are 11489.9 and -25285.3,
respectively. A non-finite potential energy can be caused by overlapping
interactions in bonded interactions or very large or Nan coordinate values.
Usually this is caused by a badly- or non-equilibrated initial configuration,
incorrect interactions or parameters in the topology.

I must emphasise that there is nothing wrong with the topology as the exact same input files run fine on other machines. I tried running the lysozyme in water system from Justin’s introductory tutorial using the exact same set of commands as in the tutorial and the benchMEM system from the GROMACS benchmark set. Both of these also crashes with the same error as above.

I’d appreciate any help or suggestion on how to debug this issue.

GROMACS version information:

GROMACS version:    2020.2
Verified release checksum is 3f718d436b1ac2d44ce97164df8a13322fc143498ba44eccfd567e20d8aaea1d
Precision:          single
Memory model:       64 bit
MPI library:        MPI
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support:        CUDA
SIMD instructions:  IBM_VSX
FFT library:        fftw-3.3.8-altivec-vsx
RDTSCP usage:       disabled
TNG support:        enabled
Hwloc support:      hwloc-1.11.0
Tracing support:    disabled
C compiler:         /cineca/prod/opt/compilers/gnu/8.4.0/none/bin/gcc GNU 8.4.0
C compiler flags:   -mcpu=power9 -mtune=power9 -mvsx -pthread -fexcess-precision=fast -funroll-all-loops -O3 -DNDEBUG
C++ compiler:       /cineca/prod/opt/compilers/gnu/8.4.0/none/bin/g++ GNU 8.4.0
C++ compiler flags: -mcpu=power9 -mtune=power9 -mvsx -pthread -fexcess-precision=fast -funroll-all-loops -fopenmp -O3 -DNDEBUG
CUDA compiler:      /cineca/prod/opt/compilers/cuda/10.1/none/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2019 NVIDIA Corporation;Built on Fri_Feb__8_19:09:58_PST_2019;Cuda compilation tools, release 10.1, V10.1.105
CUDA compiler flags:-std=c++14;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_35,code=compute_35;-gencode;arch=compute_50,code=compute_50;-gencode;arch=compute_52,code=compute_52;-gencode;arch=compute_60,code=compute_60;-gencode;arch=compute_61,code=compute_61;-gencode;arch=compute_70,code=compute_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;;-mcpu=power9 -mtune=power9 -mvsx -pthread -fexcess-precision=fast -funroll-all-loops -fopenmp -O3 -DNDEBUG
CUDA driver:        10.10
CUDA runtime:       N/A

It is hard to predict what is going. A quick check would be recompile gromacs in debug mode (-DCMAKE_BUILD_TYPE=Debug) to check if optimization flags messed anything up in IBM machine.

I tried checking with a debug build of gromacs but that didn’t change anything. But I’ve managed to stumble across the source of the problem.

I discovered that if I offload the pme part of the calculation to the GPU the systems run normally without any crashes. This led me to suspect that something was wrong with the FFTW library. The FFTW library used was built by the gromacs installation (-DGMX_BUILD_OWN_FFTW=ON). So, I built FFTW from source myself and experimented with configuration and build options.

The problem seems to stem from building FFTW with --enable-altivec and --enable-vsx - which gromacs’s build does. Compiling gromacs with a FFTW library without these fixes the crashes I was getting before.

However, no matter what the options I build FFTW3 with it’s own tests always pass. So I don’t really have any idea whether this is a bug with GROMACS, the FFTW library or the compiler (I’m using gcc version 8.4.0). Perhaps someone with more experience can shed light on this matter.