Build issue on cray-system (Gromacs 2025.0)-multiple static assertion issues

GROMACS version: 2025.0
GROMACS modification: No

Hello folks,

I am trying to get the newest gromacs 2025.0 working on a cray system and have run into what appears to be some building issues that don’t lend to an obvious fix (at least to me). I am hoping another pair of eyes might be able to help me pin this down.

A few points:

  • This is on a Cray system with AMD gpus (gfx90a)
  • Build is using the rocm provided clang (to match the AdaptiveCpp build).
  • AdaptiveCpp was built using ROCM-6.2.4 (acpp-info data is provided below).
  • We are using the cray provided mpich
  • Cmake doesn’t throw any warnings

The gromacs cmake input is the following:

cmake -DGMX_MPI=ON -DGMX_HWLOC:BOOL=ON -DGMX_GPU=SYCL \ 
-DGMX_SYCL=ACPP -DGMX_OPENMP:BOOL=ON \
-DGMX_C_COMPILER=${ROCM_PATH}/llvm/bin/clang \
-DGMX_CXX_COMPILER=${ROCM_PATH}/llvm/bin/clang++ \
-DGMX_GPLUSPLUS=/opt/cray/pe/gcc-native/13/bin/g++ \
-DCMAKE_PREFIX_PATH=${LOCAL}/AdaptiveCpp-24.10.0/ \
-DACPP_TARGETS=hip:gfx90a \
-DGMX_USE_HEFFTE:BOOL=ON -DGMX_GPU_FFT_LIBRARY=rocFFT ..

When using make I get 20 errors, but all of them a look like the following:

/usr/lib64/gcc/x86_64-suse-linux/13/../../../../include/c++/13/type_traits:2388:21: error: static assertion failed due to requirement '__declval_protector<mu::ParserCallback *>::__stop': declval() must not be used!
 2388 |       static_assert(__declval_protector<_Tp>::__stop,

AdaptiveCPP (acpp-info) output looks like the following:

=================Backend information===================
Loaded backend 0: OpenMP
  Found device: AdaptiveCpp OpenMP host device
Loaded backend 1: HIP
  Found device: AMD Instinct MI210

=================Device information===================
***************** Devices for backend OpenMP *****************
Device 0:
 General device information:
  Name: AdaptiveCpp OpenMP host device
  Backend: OpenMP
  Platform: Backend 4 / Platform 0
  Vendor: the AdaptiveCpp project
  Arch: <native-cpu>
  Driver version: 1.2
  Is CPU: 1
  Is GPU: 0
 Default executor information:
  Is in-order queue: 0
  Is out-of-order queue: 1
  Is task graph: 0
 Device support queries:
  images: 0
  error_correction: 0
  host_unified_memory: 1
  little_endian: 1
  global_mem_cache: 1
  global_mem_cache_read_only: 0
  global_mem_cache_read_write: 1
  emulated_local_memory: 1
  sub_group_independent_forward_progress: 0
  usm_device_allocations: 1
  usm_host_allocations: 1
  usm_atomic_host_allocations: 1
  usm_shared_allocations: 1
  usm_atomic_shared_allocations: 1
  usm_system_allocations: 1
  execution_timestamps: 1
  sscp_kernels: 0
 Device properties:
  max_compute_units: 128
  max_global_size0: 18446744073709551615
  max_global_size1: 18446744073709551615
  max_global_size2: 18446744073709551615
  max_group_size: 1024
  max_num_sub_groups: 18446744073709551615
  preferred_vector_width_char: 4
  preferred_vector_width_double: 1
  preferred_vector_width_float: 1
  preferred_vector_width_half: 2
  preferred_vector_width_int: 1
  preferred_vector_width_long: 1
  preferred_vector_width_short: 2
  native_vector_width_char: 4
  native_vector_width_double: 1
  native_vector_width_float: 1
  native_vector_width_half: 2
  native_vector_width_int: 1
  native_vector_width_long: 1
  native_vector_width_short: 2
  max_clock_speed: 0
  max_malloc_size: 18446744073709551615
  address_bits: 64
  max_read_image_args: 0
  max_write_image_args: 0
  image2d_max_width: 0
  image2d_max_height: 0
  image3d_max_width: 0
  image3d_max_height: 0
  image3d_max_depth: 0
  image_max_buffer_size: 0
  image_max_array_size: 0
  max_samplers: 0
  max_parameter_size: 18446744073709551615
  mem_base_addr_align: 8
  global_mem_cache_line_size: 64
  global_mem_cache_size: 1
  global_mem_size: 18446744073709551615
  max_constant_buffer_size: 18446744073709551615
  max_constant_args: 18446744073709551615
  local_mem_size: 18446744073709551615
  printf_buffer_size: 18446744073709551615
  partition_max_sub_devices: 0
  vendor_id: 18446744073709551615
  sub_group_sizes: 1


***************** Devices for backend HIP *****************
Device 0:
 General device information:
  Name: AMD Instinct MI210
  Backend: HIP
  Platform: Backend 1 / Platform 0
  Vendor: AMD
  Arch: gfx90a:sramecc+:xnack-
  Driver version: 60241134
  Is CPU: 0
  Is GPU: 1
 Default executor information:
  Is in-order queue: 0
  Is out-of-order queue: 1
  Is task graph: 0
 Device support queries:
  images: 0
  error_correction: 0
  host_unified_memory: 0
  little_endian: 1
  global_mem_cache: 1
  global_mem_cache_read_only: 0
  global_mem_cache_read_write: 1
  emulated_local_memory: 0
  sub_group_independent_forward_progress: 1
  usm_device_allocations: 1
  usm_host_allocations: 1
  usm_atomic_host_allocations: 0
  usm_shared_allocations: 1
  usm_atomic_shared_allocations: 0
  usm_system_allocations: 0
  execution_timestamps: 1
  sscp_kernels: 0
 Device properties:
  max_compute_units: 104
  max_global_size0: 2199023254528
  max_global_size1: 67108864
  max_global_size2: 67108864
  max_group_size: 1024
  max_num_sub_groups: 16
  preferred_vector_width_char: 4
  preferred_vector_width_double: 1
  preferred_vector_width_float: 1
  preferred_vector_width_half: 2
  preferred_vector_width_int: 1
  preferred_vector_width_long: 1
  preferred_vector_width_short: 2
  native_vector_width_char: 4
  native_vector_width_double: 1
  native_vector_width_float: 1
  native_vector_width_half: 2
  native_vector_width_int: 1
  native_vector_width_long: 1
  native_vector_width_short: 2
  max_clock_speed: 1700
  max_malloc_size: 68702699520
  address_bits: 64
  max_read_image_args: 0
  max_write_image_args: 0
  image2d_max_width: 0
  image2d_max_height: 0
  image3d_max_width: 0
  image3d_max_height: 0
  image3d_max_depth: 0
  image_max_buffer_size: 0
  image_max_array_size: 0
  max_samplers: 0
  max_parameter_size: 18446744073709551615
  mem_base_addr_align: 8
  global_mem_cache_line_size: 128
  global_mem_cache_size: 8388608
  global_mem_size: 68702699520
  max_constant_buffer_size: 2147483647
  max_constant_args: 18446744073709551615
  local_mem_size: 65536
  printf_buffer_size: 18446744073709551615
  partition_max_sub_devices: 0
  vendor_id: 1022
  sub_group_sizes: 64

Any thoughts? Does this look like an issue with clang or I am i just blind?

Hi!

From what I can see Looks like the same error as in GROMACS 2025.0 fails to build with AMD Clang (ROCm 6.3.2) with libstdc++ from GCC 11 on RHEL 9 (#5301) · Issues · GROMACS / GROMACS · GitLab.

Should not be an issue with Clang per se (ROCm 6.2.4 works on a Cray system we have here), but something indeed goes wrong with the way compiler is invoked. Could you also show the output of module list, and perhaps run the build as VERBOSE=1 make, and share the command line printed right before it barfs out the stream of errors?

Nitpick: GMX_GPLUSPLUS is not a valid option; it should be GMX_GPLUSPLUS_PATH. But it picks up the correct headers anyway, so that should not be relevant to the problem.

@al42and

I think you are right, I think the error on the gitlab is very similar. Below is the output for module list and verbose=1 make;

 1) craype-x86-trento                7) Core/24.07         13) cray-dsmml/0.3.0           19) cray-fftw/3.3.10.9
  2) libfabric/1.22.0                 8) tmux/3.4           14) cray-mpich/8.1.31          20) craype-accel-amd-gfx90a
  3) craype-network-ofi               9) hsi/default        15) cray-libsci/24.11.0        21) heffte/2.4.1-mpi-fftw
  4) perftools-base/24.11.0          10) lfs-wrapper/0.0.1  16) PrgEnv-amd/8.6.0
  5) xpmem/2.10.6-1.2_gfaa90a94be64  11) DefApps            17) amd/6.2.4
  6) cray-pmi/6.1.15                 12) craype/2.7.33      18) darshan-runtime/3.4.6-mpi

And for the make VERBOSE=1

[  0%] Built target gmx_objlib
[  0%] Built target scanner
[  0%] Generating release version information
[  0%] Built target release-version-info
[  0%] Built target internal_rpc_xdr
[  0%] Built target thread_mpi
[  1%] Built target tng_io_obj
[  3%] Built target tng_io_zlib
[  3%] Built target lmfit_objlib
[  6%] Built target colvars_objlib
[  6%] Building CXX object _deps/muparser-build/CMakeFiles/muparser.dir/src/muParserBase.cpp.o
In file included from /lustre/orion/proj-shared/bie123/GromacsInstalls/gromacs-2025.0/src/external/muparser/src/muParserBase.cpp:29:
In file included from /lustre/orion/proj-shared/bie123/GromacsInstalls/gromacs-2025.0/src/external/muparser/include/muParserBase.h:33:
In file included from /opt/rocm-6.2.4/lib/llvm/lib/clang/18/include/openmp_wrappers/cmath:86:
In file included from /opt/rocm-6.2.4/lib/llvm/lib/clang/18/include/__clang_hip_cmath.h:20:
/usr/lib64/gcc/x86_64-suse-linux/13/../../../../include/c++/13/type_traits:2388:21: error: static assertion failed due to requirement '__declval_protector<mu::ParserCallback *>::__stop': declval() must not be used!
 2388 |       static_assert(__declval_protector<_Tp>::__stop,
      |                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib64/gcc/x86_64-suse-linux/13/../../../../include/c++/13/type_traits:903:10: note: in instantiation of function template specialization 'std::declval[device={arch(amdgcn)}, implementation={extension(match_any, allow_templates)}]<mu::ParserCallback *>' requested here
  903 |     auto declval() noexcept -> decltype(__declval<_Tp>(0));
      |          ^
/usr/lib64/gcc/x86_64-suse-linux/13/../../../../include/c++/13/type_traits:1255:31: note: in instantiation of function template specialization 'std::declval<mu::ParserCallback *>' requested here

fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
make[2]: *** [_deps/muparser-build/CMakeFiles/muparser.dir/build.make:90: _deps/muparser-build/CMakeFiles/muparser.dir/src/muParserBase.cpp.o] Error 1
make[2]: Leaving directory '/lustre/orion/bie123/proj-shared/GromacsInstalls/gromacs-2025.0/build'
make[1]: *** [CMakeFiles/Makefile2:4616: _deps/muparser-build/CMakeFiles/muparser.dir/all] Error 2
make[1]: Leaving directory '/lustre/orion/bie123/proj-shared/GromacsInstalls/gromacs-2025.0/build'
make: *** [Makefile:166: all] Error 2

Another note, when compiling using VkFFT instead of the rocFFT and not using HEFFTe, it seems like it builds (at least my colleague has told me this), but I’ve not had a chance to test it yet.

EDIT: I checked with my colleague and they also didn’t have the craype-accel-gfx90a module loaded. Not sure if that is relevant.

That sounds relevant. Could you try unloading it and then, in a cleared directory, re-running CMake and build?

I unloaded the craype-accel-gfx90a module and was still running into issues. If I drop trying to use HeFFTe and rocFFT and just use VkFFT everything does seem to compile nicely and run.

If I don’t use HeFFTe but still try to use rocFFT, I end up with the following errors:

[ 30%] Building CXX object src/gromacs/CMakeFiles/libgromacs.dir/fft/gpu_3dfft_sycl_rocfft.cpp.o
acpp warning: No optimization flag was given, optimizations are disabled by default. Performance may be degraded. Compile with e.g. -O2/-O3 to enable optimizations.
/lustre/orion/proj-shared/bie123/GromacsInstalls/gromacs-2025.0/src/gromacs/fft/gpu_3dfft_sycl_rocfft.cpp:326:50: error: expected expression
  326 |     impl_->queue_.submit(GMX_SYCL_DISCARD_EVENT[&](sycl::handler & cgh) {
      |                                                  ^
/lustre/orion/proj-shared/bie123/GromacsInstalls/gromacs-2025.0/src/gromacs/fft/gpu_3dfft_sycl_rocfft.cpp:326:26: error: use of undeclared identifier 'GMX_SYCL_DISCARD_EVENT'
  326 |     impl_->queue_.submit(GMX_SYCL_DISCARD_EVENT[&](sycl::handler & cgh) {
      |                          ^
/lustre/orion/proj-shared/bie123/GromacsInstalls/gromacs-2025.0/src/gromacs/fft/gpu_3dfft_sycl_rocfft.cpp:326:66: error: expected '(' for function-style cast or type construction
  326 |     impl_->queue_.submit(GMX_SYCL_DISCARD_EVENT[&](sycl::handler & cgh) {
      |                                                    ~~~~~~~~~~~~~ ^
/lustre/orion/proj-shared/bie123/GromacsInstalls/gromacs-2025.0/src/gromacs/fft/gpu_3dfft_sycl_rocfft.cpp:326:68: error: use of undeclared identifier 'cgh'
  326 |     impl_->queue_.submit(GMX_SYCL_DISCARD_EVENT[&](sycl::handler & cgh) {
      |                                                                    ^
4 errors generated when compiling for gfx90a.
make[2]: *** [src/gromacs/CMakeFiles/libgromacs.dir/build.make:12023: src/gromacs/CMakeFiles/libgromacs.dir/fft/gpu_3dfft_sycl_rocfft.cpp.o] Error 1

EDIT: I also have been able to get it to build now with the craype-accel-amd-gfx90a module loaded and using VkFFT. rocFFT builds still end up with the error listed above.

Good catch. Will be fixed in 2025.1, due to be released tomorrow.

@al42and

I noticed the new release and was able to get what appeared to be a functional build working (at least for my initial tests); however, now when I try to run any production length simulations, i.e. one that need to checkpoint, I am running into the following error:

step 324900, remaining wall clock time:    24 s
-------------------------------------------------------
Program:     gmx mdrun, version 2025.1
Source file: src/gromacs/mdlib/mdoutf.cpp (line 475)
Function:    void write_checkpoint(const char *, gmx_bool, FILE *, const t_commrec *, int *, int, IntegrationAlgorithm, int, gmx_bool, LambdaWeightCalculation, int64_t, double, t_state *, ObservablesHistory *, const gmx::MDModulesNotifiers &, gmx::WriteCheckpointDataHolder *, bool, MPI_Comm)

System I/O error:
Cannot rename checkpoint file from state.cpt to state_prev.cpt; maybe you are
out of disk space?

For more information and tips for troubleshooting, please check the GROMACS
website at https://manual.gromacs.org/current/user-guide/run-time-errors.html
-------------------------------------------------------
MPICH ERROR [Rank 0] [job id ] [Wed Mar 12 23:06:20 2025] [frontier09097] - Abort(1) (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0

What’s strange is that I have plenty of disk space where I am running. I’ve built with and without HeFFTe support (using the same ROCM/AMD settings used in my initial post) and one using just VkFFT. All of them give the same problem. If I do not checkpoint, my test systems seem to run, but that isn’t ideal on a shared system using a queuing system. Thoughts?

{Edit: I did need to set the MuParser option to none for the build to work}
{Edit 2: I also tried without any GPUs and I get the same issue. Building with the amd clang and using my local resource’s cray-mpich)

If you have time, could you elaborate a bit here: with the same set of loaded modules (and with muParser enabled), VkFFT was building fine, but HeFFTe/rocFFT build was failing (due to the muParser error)? If you load HeFFTe module but still try to build with VkFFT, does it work?

Does the issue happen at the very first/second checkpoint write or randomly later? What type of the filesystem you’re running on?

We use the standard file copying function, so it is not immediately obvious what can go wrong there. GPU offloading type should not matter here, but it’s nice that you checked thoroughly.

For clarification on your first question.

For building the new 2025.1 code, with VkFFT or HeFFTe/ROCM I had done my initial tests with the muparser disabled (set to NONE) since I needed to disable them in my prior 2025.0 build attempts with HeFFTe/ROCM build.

Doing a fresh build this morning with the muparser enabled worked when using VkFFT. I also did a fresh build with the muparser enabled but without HeFFTe and it also was able to build. And this morning I was able to get a build with the muparser enabled with HeFFTe and rocmfft but only if the craype-accel-amd-gfx90a is not loaded

Once the craype-accel-amd-gfx90a module is loaded the builds fail unless I disable the muparser. Not using the craype-accel-amd-gfx90a module is a bit problematic as the user guide for the system I am on indicates that this module needs to be loaded (at build and runtime) in order to enable the GPU-aware MPICH.


Regarding the checkpoint/io issue; these occur whether or not the craype-accel-gfx90a module is loaded at build or runtime. I’ve also found this issues when using both a single rank and 1 gpu and 7 threads and when i’ve used multiple (upwards to 128 ranks) 8-gpus per node (so 16 nodes) and 7 cpu-threads per rank.

The runs seem to work fine until the second checkpoint, when GROMACS tries to rename the state.cpt to state_prev.cpt. If I do a restart from a job I have previously run (copied from a different computer that has a working GROMACS install) the error appears be the moment it tries to make a new checkpoint.

Interestingly, I did find a workaround (of a sort). If I use the -cpnum 1 option so that a unique checkpoint is saved every time a checkpoint is made instead of the standard [name-here].cpt and [name-here]_prev.cpt pairing I do not get the out of space; i/o error .

This is running on a lustre filesystem.

Just an update. The checkpoint issue (from 2025.1) also occurs when trying to do a build that uses HIP instead of SYCL also. I’m digging into this a bit more to see if I can reproduce the issue on a system with a non-lustre filesystem and I’ll edit this post if/when I can reproduce the error.

Edit:
Digging at the gmx_file_copy utility function an adding an output of the errorcode gives; i.e. modifying gmx_file_copy

int gmx_file_copy(const std::filesystem::path& oldname, const std::filesystem::path& newname, bool copy_if_empty)
{
    if (!std::filesystem::exists(oldname))
    {
        return 1;
    }

    std::error_code errorCode;
    if (!std::filesystem::is_empty(oldname) || copy_if_empty)
    {
        std::filesystem::copy_file(
                oldname, newname, std::filesystem::copy_options::overwrite_existing, errorCode);
         if (errorCode)
        {
                std::cerr << "Error copying file: " << errorCode.message() << std::endl;
        return errorCode.value();
        }
    return 0;
    }
}

Now results in the following additional information prior to the checkpoint error:

  Error copying file: No data available

-------------------------------------------------------
Program:     gmx mdrun, version 2025.1
Source file: src/gromacs/mdlib/mdoutf.cpp (line 475)
Function:    void write_checkpoint(const char *, gmx_bool, FILE *, const t_commrec *, int *, int, IntegrationAlgorithm, int, gmx_bool, LambdaWeightCalculation, int64_t, double, t_state *, ObservablesHistory *, const gmx::MDModulesNotifiers &, gmx::WriteCheckpointDataHolder *, bool, MPI_Comm)

System I/O error:
Cannot rename checkpoint file from state.cpt to state_prev.cpt; maybe you are
out of disk space?

For more information and tips for troubleshooting, please check the GROMACS
website at https://manual.gromacs.org/current/user-guide/run-time-errors.html
-------------------------------------------------------

Still working on testing this on a non-lustre filesystem setup.