Segfault with ntmpi >2 with 2 GPUs

GROMACS version: 2025.3
GROMACS modification: No

Hello world,

I’m running into issues with Gromacs 2025.3 using a simulation (simple lysozyme in water) that worked on 2024.5.

If I run: gmx mdrun -deffnm md_0_1 -nb gpu -pme gpu -npme 1 -ntmpi 4 -pin on -nsteps 1000 I get a segfault. If I run gmx mdrun -deffnm md_0_1 -nb gpu -pme gpu -npme 1 -ntmpi 2 -pin on -nsteps 1000 the simulation runs to completion (eventually). With gromacs 2024.5 I don’t have this issue. I’ve attached the logfiles for these two commands.
md_0_1_failed.log (23.5 KB)
md_0_1_success.log (31.3 KB)

My cmake invocation was cmake .. -DGMX_GPU=CUDA -DCMAKE_INSTALL_PREFIX=/usr/local/gromacs-2025.3 -DGMX_FFT_LIBRARY=mkl -DMKLROOT=/usr/ -DGMX_BUILD_UNITTESTS=ON -DGMX_HWLOC=ON -DGMX_EXTERNAL_ZLIB=ON -DCMAKE_CUDA_COMPILER=/usr/local/cuda-13/bin/nvcc
Here’s the cmake output:

-- The C compiler identification is GNU 14.2.0
-- The CXX compiler identification is GNU 14.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Python3: /usr/bin/python3 (found suitable version "3.13.5", minimum required is "3.9") found components: Interpreter Development Development.Module Development.Embed
-- Selected GPU FFT library - cuFFT
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- Performing Test CFLAGS_WARN_NO_MISSING_FIELD_INITIALIZERS
-- Performing Test CFLAGS_WARN_NO_MISSING_FIELD_INITIALIZERS - Success
-- Performing Test CFLAGS_EXCESS_PREC
-- Performing Test CFLAGS_EXCESS_PREC - Success
-- Performing Test CFLAGS_COPT
-- Performing Test CFLAGS_COPT - Success
-- Performing Test CFLAGS_NOINLINE
-- Performing Test CFLAGS_NOINLINE - Success
-- Performing Test CXXFLAGS_WARN_NO_MISSING_FIELD_INITIALIZERS
-- Performing Test CXXFLAGS_WARN_NO_MISSING_FIELD_INITIALIZERS - Success
-- Performing Test CXXFLAGS_EXCESS_PREC
-- Performing Test CXXFLAGS_EXCESS_PREC - Success
-- Performing Test CXXFLAGS_COPT
-- Performing Test CXXFLAGS_COPT - Success
-- Performing Test CXXFLAGS_NOINLINE
-- Performing Test CXXFLAGS_NOINLINE - Success
-- Looking for include file unistd.h
-- Looking for include file unistd.h - found
-- Looking for include file pwd.h
-- Looking for include file pwd.h - found
-- Looking for include file dirent.h
-- Looking for include file dirent.h - found
-- Looking for include file time.h
-- Looking for include file time.h - found
-- Looking for include file sys/time.h
-- Looking for include file sys/time.h - found
-- Looking for include file io.h
-- Looking for include file io.h - not found
-- Looking for include file sched.h
-- Looking for include file sched.h - found
-- Looking for include file xmmintrin.h
-- Looking for include file xmmintrin.h - found
-- Looking for gettimeofday
-- Looking for gettimeofday - found
-- Looking for sysconf
-- Looking for sysconf - found
-- Looking for nice
-- Looking for nice - found
-- Looking for fsync
-- Looking for fsync - found
-- Looking for _fileno
-- Looking for _fileno - not found
-- Looking for fileno
-- Looking for fileno - found
-- Looking for _commit
-- Looking for _commit - not found
-- Looking for sigaction
-- Looking for sigaction - found
-- Performing Test HAVE_BUILTIN_CLZ
-- Performing Test HAVE_BUILTIN_CLZ - Success
-- Performing Test HAVE_BUILTIN_CLZLL
-- Performing Test HAVE_BUILTIN_CLZLL - Success
-- Looking for clock_gettime in rt
-- Looking for clock_gettime in rt - found
-- Looking for feenableexcept in m
-- Looking for feenableexcept in m - found
-- Looking for fedisableexcept in m
-- Looking for fedisableexcept in m - found
-- Checking for sched.h GNU affinity API
-- Performing Test sched_affinity_compile
-- Performing Test sched_affinity_compile - Success
-- Looking for include file mm_malloc.h
-- Looking for include file mm_malloc.h - found
-- Looking for include file malloc.h
-- Looking for include file malloc.h - found
-- Checking for _mm_malloc()
-- Checking for _mm_malloc() - supported
-- Looking for posix_memalign
-- Looking for posix_memalign - found
-- Looking for memalign
-- Looking for memalign - not found
-- Torch not found. Neural network potential support will be disabled.
-- Using default binary suffix: ""
-- Using default library suffix: ""
-- Looking for HWLOC
-- Looking for hwloc_topology_init
-- Looking for hwloc_topology_init - found
-- hwloc version: 
-- Found HWLOC: /usr/lib/x86_64-linux-gnu/libhwloc.so (found suitable version "2.12.0", minimum required is "1.5")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test TEST_ATOMICS
-- Performing Test TEST_ATOMICS - Success
-- Atomic operations found
-- Performing Test PTHREAD_SETAFFINITY
-- Performing Test PTHREAD_SETAFFINITY - Success
-- Found ZLIB: /usr/lib/x86_64-linux-gnu/libz.so (found version "1.3.1")
-- Looking for zlibVersion in /usr/lib/x86_64-linux-gnu/libz.so
-- Looking for zlibVersion in /usr/lib/x86_64-linux-gnu/libz.so - found
-- Detecting best SIMD instructions for this CPU
-- Checking for GCC x86 inline asm
-- Checking for GCC x86 inline asm - supported
-- Detected build CPU features - aes apic avx avx2 avx512f avx512cd avx512bw avx512vl avx512bf16 avx512secondFMA clfsh cmov cx8 cx16 f16c fma htt intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sha sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
-- Detected build CPU brand - INTEL(R) XEON(R) PLATINUM 8568Y+
-- Detected best SIMD instructions for this CPU - AVX_512
-- Performing Test C_march_skylake_avx512_FLAG_ACCEPTED
-- Performing Test C_march_skylake_avx512_FLAG_ACCEPTED - Success
-- Performing Test C_march_skylake_avx512_COMPILE_WORKS
-- Performing Test C_march_skylake_avx512_COMPILE_WORKS - Success
-- Performing Test CXX_march_skylake_avx512_FLAG_ACCEPTED
-- Performing Test CXX_march_skylake_avx512_FLAG_ACCEPTED - Success
-- Performing Test CXX_march_skylake_avx512_COMPILE_WORKS
-- Performing Test CXX_march_skylake_avx512_COMPILE_WORKS - Success
-- Enabling 512-bit AVX-512 SIMD instructions using CXX flags:  -march=skylake-avx512
-- Performing Test _callconv___vectorcall
-- Performing Test _callconv___vectorcall - Failed
-- Performing Test _callconv___regcall
-- Performing Test _callconv___regcall - Failed
-- Performing Test _callconv_ 
-- Performing Test _callconv_  - Success
-- Found CUDAToolkit: /usr/local/cuda-13/targets/x86_64-linux/include (found suitable version "13.0.88", minimum required is "12.1")
-- Adding work-around for issue compiling CUDA code with glibc 2.23 string.h
-- Check for working NVCC/C++ compiler combination with nvcc ''
-- Check for working NVCC/C++ compiler combination - works
-- Checking if nvcc accepts flags --generate-code=arch=compute_50,code=sm_50
-- Checking if nvcc accepts flags --generate-code=arch=compute_50,code=sm_50 - No
-- Checking if nvcc accepts flags --generate-code=arch=compute_52,code=sm_52
-- Checking if nvcc accepts flags --generate-code=arch=compute_52,code=sm_52 - No
-- Checking if nvcc accepts flags --generate-code=arch=compute_60,code=sm_60
-- Checking if nvcc accepts flags --generate-code=arch=compute_60,code=sm_60 - No
-- Checking if nvcc accepts flags --generate-code=arch=compute_61,code=sm_61
-- Checking if nvcc accepts flags --generate-code=arch=compute_61,code=sm_61 - No
-- Checking if nvcc accepts flags --generate-code=arch=compute_70,code=sm_70
-- Checking if nvcc accepts flags --generate-code=arch=compute_70,code=sm_70 - No
-- Checking if nvcc accepts flags --generate-code=arch=compute_75,code=sm_75
-- Checking if nvcc accepts flags --generate-code=arch=compute_75,code=sm_75 - Success
-- Checking if nvcc accepts flags --generate-code=arch=compute_80,code=sm_80
-- Checking if nvcc accepts flags --generate-code=arch=compute_80,code=sm_80 - Success
-- Checking if nvcc accepts flags --generate-code=arch=compute_86,code=sm_86
-- Checking if nvcc accepts flags --generate-code=arch=compute_86,code=sm_86 - Success
-- Checking if nvcc accepts flags --generate-code=arch=compute_89,code=sm_89
-- Checking if nvcc accepts flags --generate-code=arch=compute_89,code=sm_89 - Success
-- Checking if nvcc accepts flags --generate-code=arch=compute_90,code=sm_90
-- Checking if nvcc accepts flags --generate-code=arch=compute_90,code=sm_90 - Success
-- Checking if nvcc accepts flags --generate-code=arch=compute_100,code=sm_100
-- Checking if nvcc accepts flags --generate-code=arch=compute_100,code=sm_100 - Success
-- Checking if nvcc accepts flags --generate-code=arch=compute_120,code=sm_120
-- Checking if nvcc accepts flags --generate-code=arch=compute_120,code=sm_120 - Success
-- Checking if nvcc accepts flags -Wno-deprecated-gpu-targets
-- Checking if nvcc accepts flags -Wno-deprecated-gpu-targets - Success
-- Checking if nvcc accepts flags --generate-code=arch=compute_53,code=compute_53
-- Checking if nvcc accepts flags --generate-code=arch=compute_53,code=compute_53 - No
-- Checking if nvcc accepts flags --generate-code=arch=compute_90,code=compute_90
-- Checking if nvcc accepts flags --generate-code=arch=compute_90,code=compute_90 - Success
-- Checking if nvcc accepts flags -use_fast_math
-- Checking if nvcc accepts flags -use_fast_math - Success
-- Checking if nvcc accepts flags -static-global-template-stub=false
-- Checking if nvcc accepts flags -static-global-template-stub=false - Success
-- Checking if nvcc accepts flags -Xptxas=-warn-double-usage
-- Checking if nvcc accepts flags -Xptxas=-warn-double-usage - Success
-- Checking if nvcc accepts flags -Xptxas=-Werror
-- Checking if nvcc accepts flags -Xptxas=-Werror - Success
-- Checking if nvcc accepts flags -diag-suppress=177
-- Checking if nvcc accepts flags -diag-suppress=177 - Success
-- The CUDA compiler identification is NVIDIA 13.0.88 with host compiler GNU 14.2.0
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda-13/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Detected build CPU vendor - Intel
-- Detected build CPU family - 6
-- Detected build CPU model - 207
-- Detected build CPU stepping - 2
-- Checking for 64-bit off_t
-- Checking for 64-bit off_t - present
-- Checking for fseeko/ftello
-- Checking for fseeko/ftello - present
-- Checking for SIGUSR1
-- Checking for SIGUSR1 - found
-- Checking for pipe support
-- Checking for system XDR support
-- Checking for system XDR support - not present
-- Found MKL at /usr
-- Looking for DftiCreateDescriptor
-- Looking for DftiCreateDescriptor - found
-- Using external FFT library - Intel MKL
-- Looking for dgemm_
-- Looking for dgemm_ - found
-- Looking for cheev_
-- Looking for cheev_ - found
-- No image conversion possible without ImageMagick
-- Performing Test HAS_WARNING_EVERYTHING
-- Performing Test HAS_WARNING_EVERYTHING - Failed
-- Found Python: /usr/bin/python3 (found version "3.13.5") found components: Interpreter
-- Performing Test HAVE_NO_DEPRECATED_COPY
-- Performing Test HAVE_NO_DEPRECATED_COPY - Success
-- Looking for dlopen
-- Looking for dlopen - found
-- Performing Test HAS_NO_STRINGOP_TRUNCATION
-- Performing Test HAS_NO_STRINGOP_TRUNCATION - Success
-- Performing Test HAS_WARNING_NO_CAST_FUNCTION_TYPE_STRICT
-- Performing Test HAS_WARNING_NO_CAST_FUNCTION_TYPE_STRICT - Success
-- Performing Test HAS_NO_UNUSED
-- Performing Test HAS_NO_UNUSED - Success
-- Performing Test HAS_NO_UNUSED_PARAMETER
-- Performing Test HAS_NO_UNUSED_PARAMETER - Success
-- Performing Test HAS_NO_MISSING_DECLARATIONS
-- Performing Test HAS_NO_MISSING_DECLARATIONS - Success
-- Performing Test HAS_NO_NULL_CONVERSIONS
-- Performing Test HAS_NO_NULL_CONVERSIONS - Success
-- Looking for inttypes.h
-- Looking for inttypes.h - found
-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY - Success
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY - Success
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR - Success
-- Found Sphinx: /usr/bin/sphinx-build (found suitable version "8.1.3", minimum required is "4.0.0") found components: pygments
-- Found LATEX: /usr/bin/latex
-- Configuring done (51.7s)
-- Generating done (0.6s)
-- Build files have been written to: /root/gromacs/gromacs-2025.3/build

Note that I had to modify 3 mkl include statements to include <mkl/mkl.h>, rather than just <mkl.h>.

Please let me know what I’m missing, or if you need any more information :)

PS. make check complains about 4 failing tests:

96% tests passed, 4 tests failed out of 92

Label Time Summary:
GTest              = 1220.42 sec*proc (90 tests)
IntegrationTest    = 535.70 sec*proc (29 tests)
MpiTest            = 1033.21 sec*proc (21 tests)
QuickGpuTest       = 262.53 sec*proc (23 tests)
SlowGpuTest        = 938.55 sec*proc (14 tests)
SlowTest           = 665.49 sec*proc (14 tests)
UnitTest           =  19.23 sec*proc (47 tests)

Total Test time (real) = 577.91 sec

The following tests FAILED:
         70 - MdrunTestsTwoRanks (Failed)                       GTest IntegrationTest MpiTest SlowGpuTest
         81 - MdrunMpi2RankPmeTests (Failed)                    GTest IntegrationTest MpiTest SlowGpuTest
         87 - MdrunCoordinationConstraintsTests2Ranks (Failed)  GTest MpiTest SlowGpuTest SlowTest
         92 - MdrunVirtualSiteTests (SEGFAULT)                  GTest IntegrationTest MpiTest QuickGpuTest
Errors while running CTest
make[3]: *** [CMakeFiles/run-ctest-nophys.dir/build.make:71: CMakeFiles/run-ctest-nophys] Error 8
make[2]: *** [CMakeFiles/Makefile2:4296: CMakeFiles/run-ctest-nophys.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:4336: CMakeFiles/check.dir/rule] Error 2
make: *** [Makefile:628: check] Error 2

That looks like some bug in GROMACS. Could you provide the detailed output for the failed tests with make check?

I’d be happy to. What output do you need exactly, and where can I find it?

If you run make check, tests that fail should produce a lot of output on failures that you should be able to find higher up.

Thanks. They’re quite noisy indeed:

check_out.txt (576.1 KB)

HTH

Those are very large deviations. Could you file an issue on gitlab an report both the segv and these test failures with the output?

Done! Gromacs 2025.3 segfaults with ntmpi>2 and 2 GPUs (#5486) · Issues · GROMACS / GROMACS · GitLab