Support for NVIDIA GPUs on SYCL

GROMACS version: 2025.0
GROMACS modification: No

Hi!

We have been successfully running GROMACS with CUDA on the RTX 4070 Ti device. Now, we are trying to also test the ability to run SYCL backend on NVIDIA GPUs. We are using oneAPI 2024.2, and we tested the SYCL backend successfully on Intel GPUs.

In addition to oneAPI, we installed their CUDA plugin and we verified with simple SYCL samples that they compile and execute on this device correctly (CUDA 12.2).

When we try to execute GROMACS with the NVIDIA GPU, we are running into issues.

First, we got this output:

    Number of GPUs detected: 1
    #0: name: NVIDIA GeForce RTX 4070 Ti, architecture 8.9, vendor: NVIDIA Corporation, device version: 8.9, driver version: CUDA 12.2, status: incompatible (please recompile with correct GMX_GPU_NB_CLUSTER_SIZE of 4)

Which was strange, because I thought 4 was the default version; from –version

super-cluster 2x2x2 / cluster 4 (cluster-pair splitting on)

I looked at the GROMACS code, and since sycl-ls show that the subgroup size for my GPU is 32, then the correct value should be 8 for the device to be compatible, and the message is simply wrong. I recompiled with -DGMX_GPU_NB_CLUSTER_SIZE=8 but this one fails with the device being non-functional

  GPU info:
    Number of GPUs detected: 1
    #0: N/A, status: non-functional

Is there a way to debug this issue further and see why GROMACS fails? This is all happening inside a container, so there’s one CUDA and one oneAPI version available.

sycl-ls

[opencl:cpu][opencl:0] Intel(R) OpenCL, 12th Gen Intel(R) Core(TM) i7-12700K OpenCL 3.0 (Build 0) [2024.18.7.0.11_160000]
[cuda:gpu][cuda:0] NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 4070 Ti 8.9 [CUDA 12.2]

nvidia-smi
      
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.247.01             Driver Version: 535.247.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4070 Ti     Off | 00000000:01:00.0  On |                  N/A |
|  0%   54C    P3              35W / 285W |   3310MiB / 12282MiB |     11%      Default |
|                                         |                      |                  N/A |

Hi!

Which was strange, because I thought 4 was the default versio

Yes, the error message is partially wrong. It’s right that you’re using the wrong GMX_GPU_NB_CLUSTER_SIZE, but for NVIDIA GPUs it should be 8, not 4. We fixed it in GROMACS 2025.2 :)

Is there a way to debug this issue further and see why GROMACS fails?

It would help to know how you’re building GROMACS. Here’s the recommended way: Installation guide for exotic configurations - GROMACS 2025.2 documentation