Gromacs with SYCL support on AdaptiveCpp

GROMACS version: 2023.3
GROMACS modification: No

Hey,

I’m trying to build Gromacs with SYCL support on AdaptiveCpp. My goal is to utilize AdaptiveCpp OpenCL backend and feed the SPIR-V to the PoCL and then run on whatever PoCL supports. I have functional build on AdaptiveCpp running on PoCL through OpenCL backend.

I tried to build Gromacs with:

OCL_ICD_VENDORS=$POCL_WITH_SPIRV_ICD_PATH POCL_BUILDING=1 cmake … -DGMX_GPU=SYCL -DACPP_TARGETS=‘generic’ -DAdaptiveCpp_DIR=/home/tapio/PROJECT/Software/AdaptiveCpp-17.0.6/lib/cmake/AdaptiveCpp -DGMX_BUILD_OWN_FFTW=ON -DGMX_SYCL=ACPP -DLLVM_DIR=/home/tapio/PROJECT/Software/llvm-17.0.6/lib/cmake/llvm -DCMAKE_C_COMPILER=/home/tapio/PROJECT/Software/llvm-17.0.6/bin/clang -DCMAKE_CPP_COMPILER=/home/tapio/PROJECT/Software/llvm-17.0.6/bin/clang++ -DCMAKE_INSTALL_PREFIX=/home/tapio/PROJECT/Software/Gromacs_2023.3_SYCL

I’m getting error:

CMake Error at cmake/gmxManageSYCL.cmake:384 (message):
Cannot compile a SYCL program with -fsycl. Try a different compiler or
disable SYCL.
Call Stack (most recent call first):
CMakeLists.txt:667 (include)

which is strange because AdaptiveCpp does not use -fsycl flag, (but DPC++ does?).
So, I am wondering what is happening in the background. I guess it assumes that provided compilers are SYCL capable and is testing whether they can compile SYCL? The C/C++ compilers that I provided are standard clang/clang++ that I used to build AdaptiveCpp.

Instructions mention that the compilers should be those of ROCm but I have no intentions of using it.

Any ideas on this?

Thanks,

T

edit. I found the ACPP - specific build flags from some forum post as the latest documentation only mentions hipsycl - flags.

Hi!

For GROMACS 2023.x, you should use -DGMX_SYCL_HIPSYCL=ON flag (it works for AdaptiveCpp too); otherwise, GROMACS tries to use DPC++, and that’s why you see the -fsycl error. Only the upcoming GROMACS 2024.x supports the -DGMX_SYCL=... flags.

However, a bigger problem with your plan is that GROMACS only supports HIP and CUDA targets with AdaptiveCpp. generic target is not supported because group algorithms (reduce_over_group etc) are not available with SSCP yet. The second problem would be FFT library; VkFFT’s OpenCL backend could be used, but it would require some new boilerplate code in GROMACS, mostly in the build system. Alternatively, you can use -DGMX_GPU_FFT_LIBRARY=none to build without FFT offload support (CPU FFT is still available if you want to run simulations with PME electrostatics). Additionally, in GROMACS code there might be some hardcoded assumption that either CUDA or HIP backend is used, but those should be straightforward to fix.

I see, that’s unfortunate. Would the generic target work through DPC++? I am still trying to figure out how the SYCL compilers operate as a part of Gromacs.

Would the generic target work through DPC++?

I don’t think DPC++ has a generic target. It can emit SPIR-V bytecode, but neither AMD nor NVIDIA backends support SPIR-V, so it’s Intel-only.

Technically, it is possible to build Mesa/Rusticl, and use it as DPC++ backend; that could allow targeting AMD GPUs with SPIR-V. It is was not really usable last time I checked, but it could run simple examples, and there is some ongoing work on making this combination less broken from both DPC++ and Rusticl sides.

It is also possible to build for different vendors with DPC++. That will not be truly “generic” (you still need to list all targets at compile time; also, see caveats below) but will be portable across vendors:

-DGMX_GPU_NB_CLUSTER_SIZE=8 -DSYCL_CXX_FLAGS_EXTRA='-fsycl-targets=nvptx64-nvidia-cuda,spir64,amdgcn-amd-amdhsa;-Xsycl-target-backend=nvptx64-nvidia-cuda;--offload-arch=sm_86;-Xsycl-target-backend=amdgcn-amd-amdhsa;--offload-arch=gfx1034'

I am still trying to figure out how the SYCL compilers operate as a part of Gromacs.

  • With DPC++, GROMACS can be compiled for Intel GPUs (SPIR-V), NVIDIA (nvptx64-nvidia-cuda), AMD (amdgcn-amd-amdhsa). Multiple vendors and multiple targets of the same vendor can be targeted in the same binary (see CMake snippet above), with caveats:
    • The FFT library (either VkFFT or MKL or BBFFT) will only work for a single vendor. oneMKL seems promising as a portable backend, but, at the moment, it also must be compiled for a single specific backend.
    • The NBNXM cluster size is fixed at compile time to either 4 or 8. NVIDIA, AMD, and Intel Xe-HPC GPUs work best with 8, while the rest of Intel GPUs only support 4. That’s the same limitation OpenCL has, and we’re looking at lifting it by refactoring GROMACS code.
  • With AdaptiveCpp, you can use either CUDA or HIP in multipass mode, but not both at the same time (so, either --acpp-targets=cuda:sm_80,sm_86,sm_90 or --acpp-targets=hip:gfx908,gfx90a). Other targets (level_zero, omp, generic) are not supported.
    • Supporting both CUDA and HIP together is not impossible, it’s just some of the assumptions we have made in our code. Grepping the source code for GMX_HIPSYCL_HAVE_ should point at the relevant bits of code; nothing major there, just not a priority to deal with right now.
    • Supporting generic target would require dealing with the same preprocessed bits in our code as in the previous point, plus some parts of the kernels where we’re using macros set by the multipass compiler. And waiting for ACpp to implement sub-group shifts in generic.

Hi,

bumping this up.

ACpp generic target is still not supported, right? So I cannot get SPIR-V out of Gromacs that way.

But with oneAPI DPC++ Gromacs can produce SPIR-V, and I should be able to pass it to OpenCL devices through PoCL that supports SPIR-V?

Did I get it right?

Thanks,

Tapio

Hi!

ACpp generic target is still not supported, right? So I cannot get SPIR-V out of Gromacs that way.

Right. There are still a few things missing in ACpp so the kernels will not compile.

But with oneAPI DPC++ Gromacs can produce SPIR-V, and I should be able to pass it to OpenCL devices through PoCL that supports SPIR-V?

Yes, you can build GROMACS with DPC++ and then dump SPIR-V.

Whether PoCL would consume it, however, is unclear: DPC++ relies on Intel extensions and, furthermore, sometimes deviates from the SPIR-V standard (e.g., Using `OpTypeBool` for kernel parameters is invalid according to OpenCL SPIR-V Env specification · Issue #11531 · intel/llvm · GitHub), so the generated SPIR-V will not necessary work with PoCL out-of-the-box.

Note, that you perhaps can directly use PoCL as a backend for DPC++, instead of dumping-and-loading the kernels manually. DPC++ can use pretty much any OpenCL backend as long as it consumes the produced SPIR-V. The backend must also support USM, but that can be emulated using OpenCL Intercept Layer:

export LD_PRELOAD=/opt/opencl-intercept-layer/lib/libOpenCL.so
export CLI_Emulate_cl_intel_unified_shared_memory=1
export CLI_SuppressLogging=1

Thanks!

I will look into it.

Tapio