Gromacs and openCl - no GPU is detected

GROMACS version: 2022.1
GROMACS modification: No
Here post your question

Dear all,

I have been struggling to compile gromacs with OpenCL (Radeon 6800xt GPU) for some time. I think it’s about the time to ask experts for some help.

I have installed opencl-headers and ocl-icd-libopencl1 from apt, as suggested in the manual, but cmake complained that it can’t find opencl libraries. Therefore, I have installed the amdgpu drivers from amd (Using the amdgpu-install Script — amdgpu graphics and compute stack unknown-build documentation) and included --opencl=rocr option.

With that, I was able to run cmake:

cmake … -DGMX_GPU=OpenCL -DGMX_BUILD_OWN_FFTW=ON -DCMAKE_INSTALL_PREFIX=$HOME/gromacs-2022.1-opencl/ -DCMAKE_PREFIX_PATH=/opt/rocm-5.1.2

Followed by smooth make and make install:

gmx -version

GROMACS version: 2022.1
Precision: mixed
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 128)
GPU support: OpenCL
SIMD instructions: AVX2_256
CPU FFT library: fftw-3.3.8-sse2-avx-avx2-avx2_128
GPU FFT library: clFFT
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
C compiler: /usr/bin/cc GNU 9.4.0
C compiler flags: -mavx2 -mfma -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -O3 -DNDEBUG
C++ compiler: /usr/bin/c++ GNU 9.4.0
C++ compiler flags: -mavx2 -mfma -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -fopenmp -O3 -DNDEBUG
OpenCL include dir: /usr/include
OpenCL library: /opt/rocm-5.1.2/lib/libOpenCL.so
OpenCL version: 2.2

However, when I start mdrun (gmx mdrun -s bench.tpr -deffnm tst -nb gpu), gromacs complains:

Cannot run short-ranged nonbonded interactions on a GPU because no GPU is
detected.

clinfo (sudo /opt/amdgpu-pro/bin/clinfo) recognizes the card:

Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.1 AMD-APP (3423.0)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback

Platform Name: AMD Accelerated Parallel Processing
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name: AMD Radeon RX 6800 XT
Device Topology: PCI[ B#12, D#0, F#0 ]
Max compute units: 36
Max work items dimensions: 3
(…)

Any hints where I made the mistake?

Thanks!

Does clinfo recognize the card without sudo? If not, there are some issues with user permissions on your machine.

Thanks a lot for the reply. I have added the user to the video group and now clinfo returns information about the GPU without sudo.

Now Gromacs recognizes the GPU and allocates the GPU tasks:

1 GPU selected for this run.
Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node:
PP:0,PME:0
PP tasks will do (non-perturbed) short-ranged interactions on the GPU
PP task will update and constrain coordinates on the CPU
PME tasks will do all aspects on the GPU
Using 1 MPI thread
Using 12 OpenMP threads

However it crashes when starts the run with the following message:
Memory access fault by GPU node-1 (Agent handle: 0x56294777ee20) on address 0x1480d9bb3000. Reason: Page not present or supervisor privilege.

I presume there are still some issues with user permissions. Do you know what I might be missing?

Before installing radeon, I had nvidia card installed on this machine and have not encountered any issues.

Hi,

I suspect it may be a GROMACS issue. RDNA support is untested and due to the significant differences between the GCN/CDNA and RDNA the compute kernels will likely need some tweaks / fixes.

Can you please run with gmx mrun [...] -debug 1 -nsteps 0 and share the gmx.debug file (or at least the last 50 lines). If the crash happens before a line with “Pruning GPU kernel launch configuration:” appears in the debug file that suggests the issue is likely the above.

If so, please open an issue on Issues · GROMACS / GROMACS · GitLab

Cheers,
Szilárd

Hi Szilard,

Thanks for the prompt reply. Here’s the last 50 lines from gmx.debug:

nbl j-list #i-subcell 8    2578 32.1
number of distance checks 228112
nbl nsci 264 ncj4 1907 nsi 39684 excl4 305
nbl na_c 8 rl 1.346 ncp 39684 per cell 15.0 atoms 119.9 ratio 0.23
nbl #cluster-pairs: av 150.3 stddev 65.8 max 220
nbl j-list #i-subcell 0     180  2.4
nbl j-list #i-subcell 1     134  1.8
nbl j-list #i-subcell 2    1181 15.5
nbl j-list #i-subcell 3     668  8.8
nbl j-list #i-subcell 4    1342 17.6
nbl j-list #i-subcell 5     367  4.8
nbl j-list #i-subcell 6     785 10.3
nbl j-list #i-subcell 7     497  6.5
nbl j-list #i-subcell 8    2474 32.4
nbl nsci 2324 ncj4 21806 nsi 446542 excl4 3431
nbl na_c 8 rl 1.346 ncp 446542 per cell 168.7 atoms 1349.6 ratio 2.59
nbl #cluster-pairs: av 192.1 stddev 85.9 max 372
nbl j-list #i-subcell 0    1357  1.6
nbl j-list #i-subcell 1    3429  3.9
nbl j-list #i-subcell 2   14392 16.5
nbl j-list #i-subcell 3    7668  8.8
nbl j-list #i-subcell 4   15534 17.8
nbl j-list #i-subcell 5    4086  4.7
nbl j-list #i-subcell 6    6150  7.1
nbl j-list #i-subcell 7    5005  5.7
nbl j-list #i-subcell 8   29603 33.9
Pruning GPU kernel launch configuration:
        Local work size: 8x8x4
                Global work size: 18592x8
        #Super-clusters/clusters: 18592/8 (8)
        ShMem: 1184
Non-bonded GPU launch configuration:
        Local work size: 8x8x1
        Global work size : 18592x8
        #Super-clusters/clusters: 18592/8 (8)
 Pair Search distance check    2638320.
 NxN Ewald Elec. + LJ [V&F]   28578688.
 NxN LJ add F-switch [V&F]    28578688.
 1,4 nonbonded interactions       1337.
 Shift-X                         21050.
 Bonds                             279.
 Propers                          1785.
 Impropers                          19.
 Stop-CM                         21050.
 Calc-Ekin                       21050.
 Lincs                             450.
 Lincs-Mat                        1568.
 Constraint-V                    21015.
 Settle                          13710.
 Urey-Bradley                      915.

I don’t see anything suspicious here, but the job terminated with the same error.

Thanks for help again,
Mateusz

Thanks for the feedback. That suggests the kernels do run, but probably produce incorrect results. Can you try to make the following changes to the src/gromacs/nbnxm/opencl/nbnxm_ocl_kernel_utils.clh file: https://termbin.com/zd42
and see if this makes the code run?

Hi Szilard,

Thanks for working with me. I have modified and recompiled gromacs.

My benchmark job initializes now, however it immediately crashes with the message:

Internal error (bug):
Step 1: The total potential energy is nan, which is not finite. The LJ and
electrostatic contributions to the energy are 21934.4 and -448047,
respectively. A non-finite potential energy can be caused by overlapping
interactions in bonded interactions or very large or Nan coordinate values.
Usually this is caused by a badly- or non-equilibrated initial configuration,
incorrect interactions or parameters in the topology.

Clearly, the card has problem with handling these calculations. The job runs smoothly using only CPU.

Best,
Mateusz

Hi Mateusz,

Indeed, there’s some work needed for AMD Navi, can you please record your findings by opening an issue on our gitlab.

Thanks,
Szilárd