GROMACS cant detect GPU (AMD)

GROMACS version: 2025.1
GROMACS modification: No

Hi, everyone!
Ive got a strange problem - my GROMACS cant detect GPU (AMD).

l0vepe0ple@l0vepe0ple-Bravo-15-B7ED:~/labor/ChA/wthtEditing/SLC22A12/9B1L$ gmx mdrun -v -deffnm md_0_10_Ref -nb gpu

                      :-) GROMACS - gmx mdrun, 2025.1 (-:

Executable:   /usr/local/gromacs/bin/gmx
Data prefix:  /usr/local/gromacs
Working dir:  /home/l0vepe0ple/labor/ChA/wthtEditing/SLC22A12/9B1L
Command line:
  gmx mdrun -v -deffnm md_0_10_Ref -nb gpu


Back Off! I just backed up md_0_10_Ref.log to ./#md_0_10_Ref.log.19#
Reading file md_0_10_Ref.tpr, VERSION 2025.1 (single precision)
Changing nstlist from 20 to 100, rlist from 1.321 to 1.493


-------------------------------------------------------
Program:     gmx mdrun, version 2025.1
Source file: src/gromacs/taskassignment/findallgputasks.cpp (line 88)

Fatal error:
Cannot run short-ranged nonbonded interactions on a GPU because no GPU is
detected.

For more information and tips for troubleshooting, please check the GROMACS
website at https://manual.gromacs.org/current/user-guide/run-time-errors.html

full gmx info:
l0vepe0ple@l0vepe0ple-Bravo-15-B7ED:~/labor/ChA/wthtEditing/SLC22A12/9B1L$ gmx --version

                         :-) GROMACS - gmx, 2025.1 (-:

Executable:   /usr/local/gromacs/bin/gmx
Data prefix:  /usr/local/gromacs
Working dir:  /home/l0vepe0ple/labor/ChA/wthtEditing/SLC22A12/9B1L
Command line:
  gmx --version

GROMACS version:     2025.1
Precision:           mixed
Memory model:        64 bit
MPI library:         thread_mpi
OpenMP support:      enabled (GMX_OPENMP_MAX_THREADS = 128)
GPU support:         SYCL (AdaptiveCpp)
NBNxM GPU setup:     super-cluster 2x2x2 / cluster 8 (cluster-pair splitting off)
SIMD instructions:   AVX2_256
CPU FFT library:     fftw-3.3.10-sse2-avx-avx2-avx2_128
GPU FFT library:     VkFFT internal (1.3.1) with HIP backend
Multi-GPU FFT:       none
RDTSCP usage:        enabled
TNG support:         enabled
Hwloc support:       disabled
Tracing support:     disabled
C compiler:          /usr/bin/clang Clang 18.1.3
C compiler flags:    -mavx2 -mfma -Wno-missing-field-initializers -O3 -DNDEBUG
C++ compiler:        /usr/bin/clang++ Clang 18.1.3
C++ compiler flags:  -mavx2 -mfma -Wno-reserved-identifier -Wno-missing-field-initializers -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-source-uses-openmp -Wno-c++17-extensions -Wno-documentation-unknown-command -Wno-covered-switch-default -Wno-switch-enum -Wno-switch-default -Wno-extra-semi-stmt -Wno-weak-vtables -Wno-shadow -Wno-padded -Wno-reserved-id-macro -Wno-double-promotion -Wno-exit-time-destructors -Wno-global-constructors -Wno-documentation -Wno-format-nonliteral -Wno-used-but-marked-unused -Wno-float-equal -Wno-cuda-compat -Wno-conditional-uninitialized -Wno-conversion -Wno-disabled-macro-expansion -Wno-unused-macros -Wno-unsafe-buffer-usage -Wno-unused-parameter -Wno-unused-variable -Wno-newline-eof -Wno-old-style-cast -Wno-zero-as-null-pointer-constant -Wno-unused-but-set-variable -Wno-sign-compare -Wno-unused-result -Wno-cast-function-type-strict SHELL:-fopenmp=libomp -O3 -DNDEBUG
BLAS library:        External - detected on the system
LAPACK library:      External - detected on the system
SYCL version:        AdaptiveCpp 24.10.0+git.29fe4c1f.20250319.branch.develop.dirty
SYCL compiler:       /usr/local/adaptivecpp24_10/lib/cmake/AdaptiveCpp/syclcc-launcher
SYCL compiler flags: -Wno-unknown-cuda-version -Wno-unknown-attributes  --acpp-clang=/usr/bin/clang++
SYCL GPU flags:      -ffast-math -DHIPSYCL_ALLOW_INSTANT_SUBMISSION=1 -DACPP_ALLOW_INSTANT_SUBMISSION=1 -fgpu-inline-threshold=99999 -Wno-deprecated-declarations
SYCL targets:

l0vepe0ple@l0vepe0ple-Bravo-15-B7ED:~$ lspci -k | grep -EA3 ‘VGA|3D|Display’

03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 24 [Radeon RX 6400/6500 XT/6500M] (rev c8)
	Subsystem: Micro-Star International Co., Ltd. [MSI] Navi 24 [Radeon RX 6400/6500 XT/6500M]
	Kernel driver in use: amdgpu
	Kernel modules: amdgpu

08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt [Radeon 680M] (rev 0b)
	Subsystem: Micro-Star International Co., Ltd. [MSI] Rembrandt [Radeon 680M]
	Kernel driver in use: amdgpu
	Kernel modules: amdgpu

File from src/gromacs/taskassignment/findallgputasks.cpp

/*
 * This file is part of the GROMACS molecular simulation package.
 *
 * Copyright 2017- The GROMACS Authors
 * and the project initiators Erik Lindahl, Berk Hess and David van der Spoel.
 * Consult the AUTHORS/COPYING files and https://www.gromacs.org for details.
 *
 * GROMACS is free software; you can redistribute it and/or
 * modify it under the terms of the GNU Lesser General Public License
 * as published by the Free Software Foundation; either version 2.1
 * of the License, or (at your option) any later version.
 *
 * GROMACS is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 * Lesser General Public License for more details.
 *
 * You should have received a copy of the GNU Lesser General Public
 * License along with GROMACS; if not, see
 * https://www.gnu.org/licenses, or write to the Free Software Foundation,
 * Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA.
 *
 * If you want to redistribute modifications to GROMACS, please
 * consider that scientific software is very special. Version
 * control is crucial - bugs must be traceable. We will be happy to
 * consider code for inclusion in the official distribution, but
 * derived work must not be called official GROMACS. Details are found
 * in the README & COPYING files - if they are missing, get the
 * official version at https://www.gromacs.org.
 *
 * To help us fund GROMACS development, we humbly ask that you cite
 * the research papers on the package. Check out https://www.gromacs.org.
 */
/*! \internal \file
 * \brief
 * Defines routine for collecting all GPU tasks found on ranks of a node.
 *
 * \author Mark Abraham <mark.j.abraham@gmail.com>
 * \ingroup module_taskassignment
 */
#include "gmxpre.h"

#include "findallgputasks.h"

#include "config.h"

#include <filesystem>
#include <iterator>
#include <numeric>
#include <type_traits>
#include <vector>

#include "gromacs/taskassignment/decidegpuusage.h"
#include "gromacs/taskassignment/taskassignment.h"
#include "gromacs/utility/arrayref.h"
#include "gromacs/utility/exceptions.h"
#include "gromacs/utility/fatalerror.h"
#include "gromacs/utility/gmxassert.h"
#include "gromacs/utility/gmxmpi.h"
#include "gromacs/utility/physicalnodecommunicator.h"

namespace gmx
{

std::vector<GpuTask> findGpuTasksOnThisRank(const bool       haveGpusOnThisPhysicalNode,
                                            const TaskTarget nonbondedTarget,
                                            const TaskTarget pmeTarget,
                                            const TaskTarget bondedTarget,
                                            const TaskTarget updateTarget,
                                            const bool       useGpuForNonbonded,
                                            const bool       useGpuForPme,
                                            const bool       rankHasPpTask,
                                            const bool       rankHasPmeTask)
{

    std::vector<GpuTask> gpuTasksOnThisRank;
    if (rankHasPpTask)
    {
        if (useGpuForNonbonded)
        {

            // Note that any bonded tasks on a GPU always accompany a
            // non-bonded task.
            if (haveGpusOnThisPhysicalNode)
            {
                gpuTasksOnThisRank.push_back(GpuTask::Nonbonded);
            }
            else if (nonbondedTarget == TaskTarget::Gpu)
            {
                gmx_fatal(FARGS,
                          "Cannot run short-ranged nonbonded interactions on a GPU because no GPU "
                          "is detected.");
            }
            else if (bondedTarget == TaskTarget::Gpu)
            {
                gmx_fatal(FARGS,
                          "Cannot run bonded interactions on a GPU because no GPU is detected.");
            }
            else if (updateTarget == TaskTarget::Gpu)
            {
                gmx_fatal(FARGS,
                          "Cannot run coordinate update on a GPU because no GPU is detected.");
            }
        }
    }
    if (rankHasPmeTask)
    {
        if (useGpuForPme)
        {
            if (haveGpusOnThisPhysicalNode)
            {
                gpuTasksOnThisRank.push_back(GpuTask::Pme);
            }
            else if (pmeTarget == TaskTarget::Gpu)
            {
                gmx_fatal(FARGS, "Cannot run PME on a GPU because no GPU is detected.");
            }
        }
    }
    return gpuTasksOnThisRank;
}

namespace
{

//! Constant used to help minimize preprocessing of code.
constexpr bool g_usingMpi = GMX_MPI;

//! Helper function to prepare to all-gather the vector of non-bonded tasks on this node.
std::vector<int> allgather(const int& input, int numRanks, MPI_Comm communicator)
{

    std::vector<int> result(numRanks);
    if (g_usingMpi && numRanks > 1)
    {
        // TODO This works as an MPI_Allgather, but thread-MPI does
        // not implement that. It's only intra-node communication, and
        // happens rarely, so not worth optimizing (yet). Also
        // thread-MPI segfaults with 1 rank.
#if GMX_MPI
        int root = 0;
        // Calling a C API with the const T * from data() doesn't seem
        // to compile warning-free with all versions of MPI headers.
        //
        // TODO Make an allgather template to deal with this nonsense.
        MPI_Gather(const_cast<int*>(&input), 1, MPI_INT, const_cast<int*>(result.data()), 1, MPI_INT, root, communicator);
        MPI_Bcast(const_cast<int*>(result.data()), result.size(), MPI_INT, root, communicator);
#else
        GMX_UNUSED_VALUE(communicator);
#endif
    }
    else
    {
        result[0] = input;
    }
    return result;
}

//! Helper function to compute allgatherv displacements.
std::vector<int> computeDisplacements(ArrayRef<const int> extentOnEachRank, int numRanks)
{
    std::vector<int> displacements(numRanks + 1);
    displacements[0] = 0;
    std::partial_sum(
            std::begin(extentOnEachRank), std::end(extentOnEachRank), std::begin(displacements) + 1);
    return displacements;
}

//! Helper function to all-gather the vector of all GPU tasks on ranks of this node.
std::vector<GpuTask> allgatherv(ArrayRef<const GpuTask> input,
                                ArrayRef<const int>     extentOnEachRank,
                                ArrayRef<const int>     displacementForEachRank,
                                MPI_Comm                communicator)
{
    // Now allocate the vector and do the allgatherv
    int totalExtent = displacementForEachRank.back();

    std::vector<GpuTask> result;
    result.reserve(totalExtent);
    if (g_usingMpi && extentOnEachRank.size() > 1 && totalExtent > 0)
    {
        result.resize(totalExtent);
        // TODO This works as an MPI_Allgatherv, but thread-MPI does
        // not implement that. It's only intra-node communication, and
        // happens rarely, so not worth optimizing (yet). Also
        // thread-MPI segfaults with 1 rank and with zero totalExtent.
#if GMX_MPI
        int root = 0;
        MPI_Gatherv(reinterpret_cast<std::underlying_type_t<GpuTask>*>(const_cast<GpuTask*>(input.data())),
                    input.size(),
                    MPI_INT,
                    reinterpret_cast<std::underlying_type_t<GpuTask>*>(result.data()),
                    const_cast<int*>(extentOnEachRank.data()),
                    const_cast<int*>(displacementForEachRank.data()),
                    MPI_INT,
                    root,
                    communicator);
        MPI_Bcast(reinterpret_cast<std::underlying_type_t<GpuTask>*>(result.data()),
                  result.size(),
                  MPI_INT,
                  root,
                  communicator);
#else
        GMX_UNUSED_VALUE(communicator);
#endif
    }
    else
    {
        for (const auto& gpuTask : input)
        {
            result.push_back(gpuTask);
        }
    }
    return result;
}

} // namespace

/*! \brief Returns container of all tasks on all ranks of this node
 * that are eligible for GPU execution.
 *
 * Perform all necessary communication for preparing for task
 * assignment. Separating this aspect makes it possible to unit test
 * the logic of task assignment. */
GpuTasksOnRanks findAllGpuTasksOnThisNode(ArrayRef<const GpuTask>         gpuTasksOnThisRank,
                                          const PhysicalNodeCommunicator& physicalNodeComm)
{
    int      numRanksOnThisNode = physicalNodeComm.size_;
    MPI_Comm communicator       = physicalNodeComm.comm_;
    // Find out how many GPU tasks are on each rank on this node.
    auto numGpuTasksOnEachRankOfThisNode =
            allgather(gpuTasksOnThisRank.size(), numRanksOnThisNode, communicator);

    /* Collect on each rank of this node a vector describing all
     * GPU tasks on this node, in ascending order of rank. This
     * requires a vector allgather. The displacements indicate where
     * the GPU tasks on each rank of this node start and end within
     * the vector. */
    auto displacementsForEachRank =
            computeDisplacements(numGpuTasksOnEachRankOfThisNode, numRanksOnThisNode);
    auto gpuTasksOnThisNode = allgatherv(
            gpuTasksOnThisRank, numGpuTasksOnEachRankOfThisNode, displacementsForEachRank, communicator);

    /* Next, we re-use the displacements to break up the vector
     * of GPU tasks into something that can be indexed like
     * gpuTasks[rankIndex][taskIndex]. */
    GpuTasksOnRanks gpuTasksOnRanksOfThisNode;
    // TODO This would be nicer if we had a good abstraction for "pair
    // of iterators that point to adjacent container elements" or
    // "iterator that points to the first of a pair of valid adjacent
    // container elements, or end".
    GMX_ASSERT(displacementsForEachRank.size() > 1,
               "Even with one rank, there's always both a start and end displacement");
    auto currentDisplacementIt = displacementsForEachRank.begin();
    auto nextDisplacementIt    = currentDisplacementIt + 1;
    do
    {
        gpuTasksOnRanksOfThisNode.emplace_back();
        for (auto taskOnThisRankIndex = *currentDisplacementIt; taskOnThisRankIndex != *nextDisplacementIt;
             ++taskOnThisRankIndex)
        {
            gpuTasksOnRanksOfThisNode.back().push_back(gpuTasksOnThisNode[taskOnThisRankIndex]);
        }

        currentDisplacementIt = nextDisplacementIt;
        ++nextDisplacementIt;
    } while (nextDisplacementIt != displacementsForEachRank.end());

    return gpuTasksOnRanksOfThisNode;
}

} // namespace gmx

rocminfo:
l0vepe0ple@l0vepe0ple-Bravo-15-B7ED:~/labor/ChA/wthtEditing/SLC22A12/9B1L$ rocminfo

ROCk module version 6.10.5 is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.14
Runtime Ext Version:     1.6
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 5 7535HS with Radeon Graphics
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 5 7535HS with Radeon Graphics
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   4603                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            12                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Memory Properties:       
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    32039884(0x1e8e3cc) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    32039884(0x1e8e3cc) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    32039884(0x1e8e3cc) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 4                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    32039884(0x1e8e3cc) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1035                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon Graphics                
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      1024(0x400) KB                     
    L3:                      16384(0x4000) KB                   
  Chip ID:                 29759(0x743f)                      
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          128(0x80)                          
  Max Clock Freq. (MHz):   2770                               
  BDFID:                   768                                
  Internal Node ID:        1                                  
  Compute Unit:            16                                 
  SIMDs per CU:            2                                  
  Shader Engines:          1                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Memory Properties:       
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 120                                
  SDMA engine uCode::      34                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    4177920(0x3fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    4177920(0x3fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1035         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*******                  
Agent 3                  
*******                  
  Name:                    gfx1035                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon Graphics                
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    2                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      2048(0x800) KB                     
  Chip ID:                 5761(0x1681)                       
  ASIC Revision:           2(0x2)                             
  Cacheline Size:          128(0x80)                          
  Max Clock Freq. (MHz):   1899                               
  BDFID:                   2048                               
  Internal Node ID:        2                                  
  Compute Unit:            6                                  
  SIMDs per CU:            2                                  
  Shader Engines:          1                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Memory Properties:       APU
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 118                                
  SDMA engine uCode::      47                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16019940(0xf471e4) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    16019940(0xf471e4) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1035         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***

Ive resolved the problem.

Main idea is that when u set up gromacs (by running cmake …), the flag acpp targets must be used (in my instance -DACPP_TARGETS=‘hip:gfx1034,gfx1035’ ).

l0vepe0ple@l0vepe0ple-Bravo-15-B7ED:~/gromacs-2025.0/build$ cmake .. -DGMX_BUILD_OWN_FFTW=ON \
        -DREGRESSIONTEST_DOWNLOAD=ON \
        -DGMX_GPU=SYCL \
        -DGMX_SYCL=ACPP \
        -DGMX_GPU_FFT_LIBRARY=VKFFT \
        -DGMX_SIMD=AVX2_256 \
        -DCMAKE_PREFIX_PATH=/usr/local/acpp24_10 \                                         
        -DCMAKE_C_COMPILER=/usr/bin/clang \
        -DCMAKE_CXX_COMPILER=/usr/bin/clang++ \
        -DCMAKE_INSTALL_PREFIX=/usr/local/gromacs \
        -DACPP_TARGETS='hip:gfx1034,gfx1035'
-- The C compiler identification is Clang 18.1.3
-- The CXX compiler identification is Clang 18.1.3
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test HAVE_CRAY_MACRO
-- Performing Test HAVE_CRAY_MACRO - Failed
-- Performing Test CXX17_COMPILES_SIMPLY
-- Performing Test CXX17_COMPILES_SIMPLY - Success
-- Found Python3: /usr/bin/python3 (found suitable version "3.12.3", minimum required is "3.9") found components: Interpreter Development Development.Module Development.Embed 
-- Selected GPU FFT library - VkFFT
-- Found OpenMP_C: -fopenmp=libomp (found version "5.1") 
-- Found OpenMP_CXX: -fopenmp=libomp (found version "5.1") 
-- Found OpenMP: TRUE (found version "5.1")  
-- Performing Test CFLAGS_WARN_NO_MISSING_FIELD_INITIALIZERS
-- Performing Test CFLAGS_WARN_NO_MISSING_FIELD_INITIALIZERS - Success
-- Performing Test CXXFLAGS_WARN_NO_RESERVED_IDENTIFIER
-- Performing Test CXXFLAGS_WARN_NO_RESERVED_IDENTIFIER - Success
-- Performing Test CXXFLAGS_WARN_NO_MISSING_FIELD_INITIALIZERS
-- Performing Test CXXFLAGS_WARN_NO_MISSING_FIELD_INITIALIZERS - Success
-- Looking for include file unistd.h
-- Looking for include file unistd.h - found
-- Looking for include file pwd.h
-- Looking for include file pwd.h - found
-- Looking for include file dirent.h
-- Looking for include file dirent.h - found
-- Looking for include file time.h
-- Looking for include file time.h - found
-- Looking for include file sys/time.h
-- Looking for include file sys/time.h - found
-- Looking for include file io.h
-- Looking for include file io.h - not found
-- Looking for include file sched.h
-- Looking for include file sched.h - found
-- Looking for include file xmmintrin.h
-- Looking for include file xmmintrin.h - found
-- Looking for gettimeofday
-- Looking for gettimeofday - found
-- Looking for sysconf
-- Looking for sysconf - found
-- Looking for nice
-- Looking for nice - found
-- Looking for fsync
-- Looking for fsync - found
-- Looking for _fileno
-- Looking for _fileno - not found
-- Looking for fileno
-- Looking for fileno - found
-- Looking for _commit
-- Looking for _commit - not found
-- Looking for sigaction
-- Looking for sigaction - found
-- Performing Test HAVE_BUILTIN_CLZ
-- Performing Test HAVE_BUILTIN_CLZ - Success
-- Performing Test HAVE_BUILTIN_CLZLL
-- Performing Test HAVE_BUILTIN_CLZLL - Success
-- Looking for clock_gettime in rt
-- Looking for clock_gettime in rt - found
-- Looking for feenableexcept in m
-- Looking for feenableexcept in m - found
-- Looking for fedisableexcept in m
-- Looking for fedisableexcept in m - found
-- Checking for sched.h GNU affinity API
-- Performing Test sched_affinity_compile
-- Performing Test sched_affinity_compile - Success
-- Looking for include file mm_malloc.h
-- Looking for include file mm_malloc.h - found
-- Looking for include file malloc.h
-- Looking for include file malloc.h - found
-- Checking for _mm_malloc()
-- Checking for _mm_malloc() - supported
-- Looking for posix_memalign
-- Looking for posix_memalign - found
-- Looking for memalign
-- Looking for memalign - not found
-- Torch not found. Neural network potential support will be disabled.
-- Using default binary suffix: ""
-- Using default library suffix: ""
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test TEST_ATOMICS
-- Performing Test TEST_ATOMICS - Success
-- Atomic operations found
-- Performing Test PTHREAD_SETAFFINITY
-- Performing Test PTHREAD_SETAFFINITY - Success
-- Performing Test C_mavx2_mfma_FLAG_ACCEPTED
-- Performing Test C_mavx2_mfma_FLAG_ACCEPTED - Success
-- Performing Test C_mavx2_mfma_COMPILE_WORKS
-- Performing Test C_mavx2_mfma_COMPILE_WORKS - Success
-- Performing Test CXX_mavx2_mfma_FLAG_ACCEPTED
-- Performing Test CXX_mavx2_mfma_FLAG_ACCEPTED - Success
-- Performing Test CXX_mavx2_mfma_COMPILE_WORKS
-- Performing Test CXX_mavx2_mfma_COMPILE_WORKS - Success
-- Enabling 256-bit AVX2 SIMD instructions using CXX flags:  -mavx2 -mfma
-- Detecting flags to enable runtime detection of AVX-512 units on newer CPUs
-- Performing Test C_march_skylake_avx512_FLAG_ACCEPTED
-- Performing Test C_march_skylake_avx512_FLAG_ACCEPTED - Success
-- Performing Test C_march_skylake_avx512_COMPILE_WORKS
-- Performing Test C_march_skylake_avx512_COMPILE_WORKS - Success
-- Performing Test CXX_march_skylake_avx512_FLAG_ACCEPTED
-- Performing Test CXX_march_skylake_avx512_FLAG_ACCEPTED - Success
-- Performing Test CXX_march_skylake_avx512_COMPILE_WORKS
-- Performing Test CXX_march_skylake_avx512_COMPILE_WORKS - Success
-- Detecting flags to enable runtime detection of AVX-512 units on newer CPUs -  -march=skylake-avx512
-- Performing Test _Wno_unused_command_line_argument_FLAG_ACCEPTED
-- Performing Test _Wno_unused_command_line_argument_FLAG_ACCEPTED - Success
-- Performing Test _callconv___vectorcall
-- Performing Test _callconv___vectorcall - Success
-- Performing Test HAS_GPU_INLINE_THRESHOLD
-- Performing Test HAS_GPU_INLINE_THRESHOLD - Success
-- Performing Test CXXFLAGS_NO_DEPRECATED_DECLARATIONS
-- Performing Test CXXFLAGS_NO_DEPRECATED_DECLARATIONS - Success
-- Checking for valid AdaptiveCpp/hipSYCL compiler
-- Checking for valid AdaptiveCpp/hipSYCL compiler - Success
-- AdaptiveCpp has CUDA target enabled: OFF
-- AdaptiveCpp has HIP target enabled: ON
-- AdaptiveCpp has HIP_WAVE32 target enabled: ON
-- AdaptiveCpp has HIP_WAVE64 target enabled: OFF
-- AdaptiveCpp has SPIRV target enabled: OFF
-- AdaptiveCpp has GENERIC target enabled: OFF
-- Performing Test HAS_WARNING_NO_UNUSED_PARAMETER
-- Performing Test HAS_WARNING_NO_UNUSED_PARAMETER - Success
-- Performing Test HAS_WARNING_NO_UNUSED_VARIABLE
-- Performing Test HAS_WARNING_NO_UNUSED_VARIABLE - Success
-- Performing Test HAS_WARNING_NO_NEWLINE_EOF
-- Performing Test HAS_WARNING_NO_NEWLINE_EOF - Success
-- Performing Test HAS_WARNING_NO_OLD_STYLE_CAST
-- Performing Test HAS_WARNING_NO_OLD_STYLE_CAST - Success
-- Performing Test HAS_WARNING_NO_ZERO_AS_NULL_POINTER_CONSTANT
-- Performing Test HAS_WARNING_NO_ZERO_AS_NULL_POINTER_CONSTANT - Success
-- Performing Test HAS_WARNING_NO_UNUSED_BUT_SET_VARIABLE
-- Performing Test HAS_WARNING_NO_UNUSED_BUT_SET_VARIABLE - Success
-- Performing Test HAS_WARNING_NO_SIGN_COMPARE
-- Performing Test HAS_WARNING_NO_SIGN_COMPARE - Success
-- Performing Test HAS_WARNING_NO_UNUSED_RESULT
-- Performing Test HAS_WARNING_NO_UNUSED_RESULT - Success
-- Checking for GCC x86 inline asm
-- Checking for GCC x86 inline asm - supported
-- Detected build CPU vendor - AMD
-- Detected build CPU brand - AMD Ryzen 5 7535HS with Radeon Graphics
-- Detected build CPU family - 25
-- Detected build CPU model - 68
-- Detected build CPU stepping - 1
-- Detected build CPU features - aes amd apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt lahf misalignsse mmx msr nonstop_tsc pclmuldq pdpe1gb popcnt pse rdrnd rdtscp sha sse2 sse3 sse4a sse4.1 sse4.2 ssse3 x2apic
-- Checking for 64-bit off_t
-- Checking for 64-bit off_t - present
-- Checking for fseeko/ftello
-- Checking for fseeko/ftello - present
-- Checking for SIGUSR1
-- Checking for SIGUSR1 - found
-- Checking for pipe support
-- Checking for system XDR support
-- Checking for system XDR support - not present
-- The GROMACS-managed build of FFTW 3 will configure with the following optimizations: --enable-sse2;--enable-avx;--enable-avx2
-- Using external FFT library - FFTW3 build managed by GROMACS
-- Looking for sgemm_
-- Looking for sgemm_ - not found
-- Looking for sgemm_
-- Looking for sgemm_ - found
-- Found BLAS: /usr/lib/x86_64-linux-gnu/libblas.so  
-- Looking for cheev_
-- Looking for cheev_ - not found
-- Looking for cheev_
-- Looking for cheev_ - found
-- Found LAPACK: /usr/lib/x86_64-linux-gnu/liblapack.so;/usr/lib/x86_64-linux-gnu/libblas.so  
-- No image conversion possible without ImageMagick
-- Performing Test HAS_WARNING_EVERYTHING
-- Performing Test HAS_WARNING_EVERYTHING - Success
-- Performing Test HAS_WARNING_NO_CPLUSPLUS98_COMPAT
-- Performing Test HAS_WARNING_NO_CPLUSPLUS98_COMPAT - Success
-- Performing Test HAS_WARNING_NO_CPLUSPLUS98_COMPAT_PEDANTIC
-- Performing Test HAS_WARNING_NO_CPLUSPLUS98_COMPAT_PEDANTIC - Success
-- Performing Test HAS_WARNING_NO_RETURN_STD_MOVE_IN_CPLUSPLUS11
-- Performing Test HAS_WARNING_NO_RETURN_STD_MOVE_IN_CPLUSPLUS11 - Failed
-- Performing Test HAS_WARNING_NO_SOURCE_USED_OPENMP
-- Performing Test HAS_WARNING_NO_SOURCE_USED_OPENMP - Success
-- Performing Test HAS_WARNING_NO_CPLUSPLUS17_EXTENSIONS
-- Performing Test HAS_WARNING_NO_CPLUSPLUS17_EXTENSIONS - Success
-- Performing Test HAS_WARNING_NO_DOCUMENTATION_UNKNOWN_COMMAND
-- Performing Test HAS_WARNING_NO_DOCUMENTATION_UNKNOWN_COMMAND - Success
-- Performing Test HAS_WARNING_NO_COVERED_SWITCH_DEFAULT
-- Performing Test HAS_WARNING_NO_COVERED_SWITCH_DEFAULT - Success
-- Performing Test HAS_WARNING_NO_SWITCH_ENUM
-- Performing Test HAS_WARNING_NO_SWITCH_ENUM - Success
-- Performing Test HAS_WARNING_NO_SWITCH_DEFAULT
-- Performing Test HAS_WARNING_NO_SWITCH_DEFAULT - Success
-- Performing Test HAS_WARNING_NO_EXTRA_SEMI_STMT
-- Performing Test HAS_WARNING_NO_EXTRA_SEMI_STMT - Success
-- Performing Test HAS_WARNING_NO_WEAK_VTABLES
-- Performing Test HAS_WARNING_NO_WEAK_VTABLES - Success
-- Performing Test HAS_WARNING_NO_SHADOW
-- Performing Test HAS_WARNING_NO_SHADOW - Success
-- Performing Test HAS_WARNING_NO_PADDED
-- Performing Test HAS_WARNING_NO_PADDED - Success
-- Performing Test HAS_WARNING_NO_RESERVED_ID_MACRO
-- Performing Test HAS_WARNING_NO_RESERVED_ID_MACRO - Success
-- Performing Test HAS_WARNING_NO_DOUBLE_PROMOTION
-- Performing Test HAS_WARNING_NO_DOUBLE_PROMOTION - Success
-- Performing Test HAS_WARNING_NO_EXIT_TIME_DESTRUCTORS
-- Performing Test HAS_WARNING_NO_EXIT_TIME_DESTRUCTORS - Success
-- Performing Test HAS_WARNING_NO_GLOBAL_CONSTRUCTORS
-- Performing Test HAS_WARNING_NO_GLOBAL_CONSTRUCTORS - Success
-- Performing Test HAS_WARNING_NO_DOCUMENTATION
-- Performing Test HAS_WARNING_NO_DOCUMENTATION - Success
-- Performing Test HAS_WARNING_NO_FORMAT_NONLITERAL
-- Performing Test HAS_WARNING_NO_FORMAT_NONLITERAL - Success
-- Performing Test HAS_WARNING_NO_USED_BUT_MARKED_UNUSED
-- Performing Test HAS_WARNING_NO_USED_BUT_MARKED_UNUSED - Success
-- Performing Test HAS_WARNING_NO_FLOAT_EQUAL
-- Performing Test HAS_WARNING_NO_FLOAT_EQUAL - Success
-- Performing Test HAS_WARNING_NO_CUDA_COMPAT
-- Performing Test HAS_WARNING_NO_CUDA_COMPAT - Success
-- Performing Test HAS_WARNING_CONDITIONAL_UNINITIALIZED
-- Performing Test HAS_WARNING_CONDITIONAL_UNINITIALIZED - Success
-- Performing Test HAS_WARNING_NO_CONVERSION
-- Performing Test HAS_WARNING_NO_CONVERSION - Success
-- Performing Test HAS_WARNING_NO_DISABLED_MACRO_EXPANSION
-- Performing Test HAS_WARNING_NO_DISABLED_MACRO_EXPANSION - Success
-- Performing Test HAS_WARNING_NO_UNUSED_MACROS
-- Performing Test HAS_WARNING_NO_UNUSED_MACROS - Success
-- Performing Test HAS_WARNING_NO_UNSAFE_BUFFER_USAGE
-- Performing Test HAS_WARNING_NO_UNSAFE_BUFFER_USAGE - Success
-- Performing Test HAS_WARNING_NO_GNU_ZERO_VARIADIC_MACRO_ARGUMENTS
-- Performing Test HAS_WARNING_NO_GNU_ZERO_VARIADIC_MACRO_ARGUMENTS - Success
-- Performing Test HAS_WARNING_NO_UNUSED_MEMBER_FUNCTION
-- Performing Test HAS_WARNING_NO_UNUSED_MEMBER_FUNCTION - Success
-- Found Python: /usr/bin/python3 (found version "3.12.3") found components: Interpreter 
-- Performing Test HAVE_NO_DEPRECATED_COPY
-- Performing Test HAVE_NO_DEPRECATED_COPY - Success
-- Performing Test HAVE_NO_IMPLICIT_INT_FLOAT_CONVERSION
-- Performing Test HAVE_NO_IMPLICIT_INT_FLOAT_CONVERSION - Success
-- Performing Test HAS_NO_UNUSED_PARAMETER
-- Performing Test HAS_NO_UNUSED_PARAMETER - Success
-- Performing Test HAS_NO_STRINGOP_TRUNCATION
-- Performing Test HAS_NO_STRINGOP_TRUNCATION - Failed
-- Performing Test HAS_WARNING_NO_CAST_FUNCTION_TYPE_STRICT
-- Performing Test HAS_WARNING_NO_CAST_FUNCTION_TYPE_STRICT - Success
-- Performing Test HAS_NO_UNUSED
-- Performing Test HAS_NO_UNUSED - Success
-- Performing Test HAS_NO_MISSING_DECLARATIONS
-- Performing Test HAS_NO_MISSING_DECLARATIONS - Success
-- Performing Test HAS_NO_NULL_CONVERSIONS
-- Performing Test HAS_NO_NULL_CONVERSIONS - Success
-- Looking for inttypes.h
-- Looking for inttypes.h - found
-- Performing Test HAS_WARNING_NO_DEPRECATED_NON_PROTOTYPE
-- Performing Test HAS_WARNING_NO_DEPRECATED_NON_PROTOTYPE - Success
-- Looking for dlopen
-- Looking for dlopen - found
-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY - Success
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY - Success
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR - Success
Downloading: https://ftp.gromacs.org/regressiontests/regressiontests-2025.0.tar.gz
-- [download 0% complete]
-- [download 1% complete]
-- [download 2% complete]
-- [download 3% complete]
-- [download 4% complete]
-- [download 5% complete]
-- [download 6% complete]
-- [download 7% complete]
-- [download 8% complete]
-- [download 9% complete]
-- [download 10% complete]
-- [download 11% complete]
-- [download 12% complete]
-- [download 13% complete]
-- [download 14% complete]
-- [download 15% complete]
-- [download 16% complete]
-- [download 17% complete]
-- [download 18% complete]
-- [download 19% complete]
-- [download 20% complete]
-- [download 21% complete]
-- [download 22% complete]
-- [download 23% complete]
-- [download 24% complete]
-- [download 25% complete]
-- [download 26% complete]
-- [download 27% complete]
-- [download 28% complete]
-- [download 29% complete]
-- [download 30% complete]
-- [download 31% complete]
-- [download 32% complete]
-- [download 33% complete]
-- [download 34% complete]
-- [download 35% complete]
-- [download 36% complete]
-- [download 37% complete]
-- [download 38% complete]
-- [download 39% complete]
-- [download 40% complete]
-- [download 41% complete]
-- [download 42% complete]
-- [download 43% complete]
-- [download 44% complete]
-- [download 45% complete]
-- [download 46% complete]
-- [download 47% complete]
-- [download 48% complete]
-- [download 49% complete]
-- [download 50% complete]
-- [download 51% complete]
-- [download 52% complete]
-- [download 53% complete]
-- [download 54% complete]
-- [download 55% complete]
-- [download 56% complete]
-- [download 57% complete]
-- [download 58% complete]
-- [download 59% complete]
-- [download 60% complete]
-- [download 61% complete]
-- [download 62% complete]
-- [download 63% complete]
-- [download 64% complete]
-- [download 65% complete]
-- [download 66% complete]
-- [download 67% complete]
-- [download 68% complete]
-- [download 69% complete]
-- [download 70% complete]
-- [download 71% complete]
-- [download 72% complete]
-- [download 73% complete]
-- [download 74% complete]
-- [download 75% complete]
-- [download 76% complete]
-- [download 77% complete]
-- [download 78% complete]
-- [download 79% complete]
-- [download 80% complete]
-- [download 81% complete]
-- [download 82% complete]
-- [download 83% complete]
-- [download 84% complete]
-- [download 85% complete]
-- [download 86% complete]
-- [download 87% complete]
-- [download 88% complete]
-- [download 89% complete]
-- [download 90% complete]
-- [download 91% complete]
-- [download 92% complete]
-- [download 93% complete]
-- [download 94% complete]
-- [download 95% complete]
-- [download 96% complete]
-- [download 97% complete]
-- [download 98% complete]
-- [download 99% complete]
-- [download 100% complete]
-- Could NOT find Sphinx (missing: SPHINX_EXECUTABLE) (Required is at least version "4.0.0")
-- Could NOT find LATEX (missing: LATEX_COMPILER) 
-- Configuring done (384.3s)
-- Generating done (0.4s)
-- Build files have been written to: /home/l0vepe0ple/gromacs-2025.0/build
l0vepe0ple@l0vepe0ple-Bravo-15-B7ED:~/gromacs-2025.0/build$

U can find out device id (for gpu_id in mdrun) and architecture (for acpp targets) using acpp-info or rocminfo (if ure using nvidia gpu use nvidia-smi). Therefore now gmx --version showing SYLC targets.

l0vepe0ple@l0vepe0ple-Bravo-15-B7ED:~$ gmx --version
                         :-) GROMACS - gmx, 2025.0 (-:

Executable:   /usr/local/gromacs/bin/gmx
Data prefix:  /usr/local/gromacs
Working dir:  /home/l0vepe0ple
Command line:
  gmx --version

GROMACS version:     2025.0
Precision:           mixed
Memory model:        64 bit
MPI library:         thread_mpi
OpenMP support:      enabled (GMX_OPENMP_MAX_THREADS = 128)
GPU support:         SYCL (AdaptiveCpp)
NBNxM GPU setup:     super-cluster 2x2x2 / cluster 8 (cluster-pair splitting on)
SIMD instructions:   AVX2_256
CPU FFT library:     fftw-3.3.10-sse2-avx-avx2-avx2_128
GPU FFT library:     VkFFT internal (1.3.1) with HIP backend
Multi-GPU FFT:       none
RDTSCP usage:        enabled
TNG support:         enabled
Hwloc support:       disabled
Tracing support:     disabled
C compiler:          /usr/bin/clang Clang 18.1.3
C compiler flags:    -mavx2 -mfma -Wno-missing-field-initializers -O3 -DNDEBUG
C++ compiler:        /usr/bin/clang++ Clang 18.1.3
C++ compiler flags:  -mavx2 -mfma -Wno-reserved-identifier -Wno-missing-field-initializers -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-source-uses-openmp -Wno-c++17-extensions -Wno-documentation-unknown-command -Wno-covered-switch-default -Wno-switch-enum -Wno-switch-default -Wno-extra-semi-stmt -Wno-weak-vtables -Wno-shadow -Wno-padded -Wno-reserved-id-macro -Wno-double-promotion -Wno-exit-time-destructors -Wno-global-constructors -Wno-documentation -Wno-format-nonliteral -Wno-used-but-marked-unused -Wno-float-equal -Wno-cuda-compat -Wno-conditional-uninitialized -Wno-conversion -Wno-disabled-macro-expansion -Wno-unused-macros -Wno-unsafe-buffer-usage -Wno-unused-parameter -Wno-unused-variable -Wno-newline-eof -Wno-old-style-cast -Wno-zero-as-null-pointer-constant -Wno-unused-but-set-variable -Wno-sign-compare -Wno-unused-result -Wno-cast-function-type-strict SHELL:-fopenmp=libomp -O3 -DNDEBUG
BLAS library:        External - detected on the system
LAPACK library:      External - detected on the system
SYCL version:        AdaptiveCpp 24.10.0+git.11669686.20250321.branch.develop.dirty
SYCL compiler:       /usr/local/acpp24_10/lib/cmake/AdaptiveCpp/syclcc-launcher
SYCL compiler flags: -Wno-unknown-cuda-version -Wno-unknown-attributes  --acpp-targets="hip:gfx1034,gfx1035" --acpp-clang=/usr/bin/clang++
SYCL GPU flags:      -ffast-math -DHIPSYCL_ALLOW_INSTANT_SUBMISSION=1 -DACPP_ALLOW_INSTANT_SUBMISSION=1 -fgpu-inline-threshold=99999 -Wno-deprecated-declarations
SYCL targets:        hip:gfx1034,gfx1035

I wonna believe this info will help someone in the future.

1 Like

Hi!

Yes, that’s the right way to do things.

If you don’t mind, can you elaborate how did you install and configure AdaptiveCpp and GROMACS before, that triggered the error?

We have checks that should prevent you from installing GROMACS without specifying the architecture, so I wonder how they failed in your case :)

Hi!
To be honest, Ive not done anything sprecial. There is my pipeline:

  1. Updating c++ compilator
sudo apt update
sudo apt upgrade
sudo apt install build-essential
  1. Installing/updating Cmake
sudo snap install cmake
cmake --version (must be at least 3.28)
  1. Installing ROCm drivers (according official instructions ROCm installation overview — ROCm installation (Linux))
sudo apt update
wget https://repo.radeon.com/amdgpu-install/6.3.3/ubuntu/noble/amdgpu-install_6.3.60303-1_all.deb
sudo apt install ./amdgpu-install_6.3.60303-1_all.deb
sudo apt update
sudo amdgpu-install --list-usecase
amdgpu-install --usecase=dkms, graphics, rocm, rocmdev, rocmdevtools, lrt, opencl, openclsdk, hip, hiplibsdk, openmpsdk, mllib, mlsdk, asan
  1. Installing AdaptiveCpp
git clone https://github.com/AdaptiveCpp/AdaptiveCpp
cd AdaptiveCpp
mkdir build && cd build

But here I had to make a change in CMakeLists.txt (in folder with AdaptiveCpp) because without it AdaptiveCpp always chose the generic compilation flow:

find file “CMakeLists.txt” and replace “set(DEFAULT_TARGETS “” CACHE STRING “Default targets to compile for”)” with the string “set(DEFAULT_TARGETS “hip” CACHE STRING “Default targets to compile for”)”

and ordinary compilation

cmake .. -DCMAKE_INSTALL_PREFIX=/usr/local/FOLDER NAME\
-DWITH_ROCM_BACKEND=ON \
-DCMAKE_C_COMPILER=/usr/lib/llvm-18/bin/clang \
-DCMAKE_CXX_COMPILER=/usr/lib/llvm-18/bin/clang++ \
-DLLVM_DIR=/usr/lib/llvm-18/cmake

make -j $(nproc)
sudo make install
  1. Installing libraries for fast Fourier transform
sudo apt update
sudo apt upgrade
sudo apt install fftw-dev
sudo apt install libvkfft-dev
  1. Downloading GROMACS from official link:

https://ftp.gromacs.org/gromacs/gromacs-2025.0.tar.gz

  1. Installing GROMACS
cd ~/gromacs-2025.0
mkdir build && cd build

cmake .. -DGMX_BUILD_OWN_FFTW=ON \
-DREGRESSIONTEST_DOWNLOAD=ON \
-DGMX_GPU=SYCL \
-DGMX_SYCL=ACPP \
-DGMX_GPU_FFT_LIBRARY=VKFFT \
-DGMX_SIMD=AVX2_256 \
-DCMAKE_PREFIX_PATH=/usr/local/acpp24_10 \
-DCMAKE_C_COMPILER=/usr/bin/clang \
-DCMAKE_CXX_COMPILER=/usr/bin/clang++ \
-DCMAKE_INSTALL_PREFIX=/usr/local/gromacs \
-DACPP_TARGETS='hip:gfx1034,gfx1035' (without this string GROMACS hasnt detect any GPUs)

make
make check
sudo make install
source /usr/local/gromacs/bin/GMXRC
gmx --version
1 Like

Hello,
Just wanted to comment that if you are willing to experiment you can also try to build the native HIP version to see if you can get it running with that one.
I’m happy to help with any questions regarding this one!
Cheers
Paul

I am with the same problems currently

CPU i9-10900
GPU Radeon RX 7800 XT 16 GB
OS Ubuntu 24.04
Installation of ROCm and AdaptiveCpp then GROMACS

my gromacs installation pipeline is failling :

wget ftp://ftp.gromacs.org/gromacs/gromacs-2025.2.tar.gz

tar xfz gromacs-2025.2.tar.gz

cd gromacs-2025.2

mkdir build

cd build

cmake .. \

-DGMX_BUILD_OWN_FFTW=ON \

-DREGRESSIONTEST_DOWNLOAD=ON \

-DCMAKE_C_COMPILER=/opt/rocm-6.3.4/lib/llvm/bin/clang \

-DCMAKE_CXX_COMPILER=/opt/rocm-6.3.4/lib/llvm/bin/clang++ \

-DGMX_GPU=SYCL \

-DGMX_SYCL=ACPP \

-DHIPSYCL_TARGETS=‘hip:gfx1101,gfx1101’ \

-DGMX_GPU_FFT_LIBRARY=VkFFT \

-DGMX_ENABLE_AMD_RDNA_SUPPORT=ON \

-DGMX_GPU_NB_CLUSTER_SIZE=4 \

-DGMX_SIMD=AVX2_256 \

-DGMX_HWLOC=ON \

-DCMAKE_BUILD_TYPE=Release \

-DGMX_OPENMP=ON \

make -j$(nproc)

make -j$(nproc) check

sudo make -j$(nproc) install

source /usr/local/gromacs/bin/GMXRC

source ~/.bashrc

then when I use the GPU I have the same error :

Command line: gmx mdrun -deffnm md_protein_50 -nb gpu -pme gpu -bonded gpu --v -nsteps 0 Back Off! I just backed up md_protein_50.log to ./#md_protein_50.log.17# Reading file md_protein_50.tpr, VERSION 2025.2 (single precision) Overriding nsteps with value passed on the command line: 0 steps, 0 ps Changing nstlist from 20 to 100, rlist from 1.224 to 1.349 ------------------------------------------------------- Program: gmx mdrun, version 2025.2 Source file: src/gromacs/taskassignment/findallgputasks.cpp (line 88) Fatal error: Cannot run short-ranged nonbonded interactions on a GPU because no GPU is detected.

outputs of rocminfo:
ROCk module version 6.10.5 is loaded

HSA System Attributes

Runtime Version: 1.14
Runtime Ext Version: 1.6
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES

==========
HSA Agents


Agent 1


Name: 12th Gen Intel(R) Core™ i9-12900F
Uuid: CPU-XX
Marketing Name: 12th Gen Intel(R) Core™ i9-12900F
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 49152(0xc000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 5000
BDFID: 0
Internal Node ID: 0
Compute Unit: 24
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Memory Properties:
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 32686012(0x1f2bfbc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 32686012(0x1f2bfbc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 32686012(0x1f2bfbc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 4
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 32686012(0x1f2bfbc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:


Agent 2


Name: gfx1101
Uuid: GPU-fc34603b92a2054c
Marketing Name: AMD Radeon RX 7800 XT
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 4096(0x1000) KB
L3: 65536(0x10000) KB
Chip ID: 29822(0x747e)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2169
BDFID: 768
Internal Node ID: 1
Compute Unit: 60
SIMDs per CU: 2
Shader Engines: 3
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Memory Properties:
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 412
SDMA engine uCode:: 25
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa–gfx1101
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***

Outputs of accp-info:
=================Backend information===================
Loaded backend 0: OpenMP
Found device: AdaptiveCpp OpenMP host device
Loaded backend 1: HIP
Found device: AMD Radeon RX 7800 XT

=================Device information===================
***************** Devices for backend OpenMP *****************
Device 0:
General device information:
Name: AdaptiveCpp OpenMP host device
Backend: OpenMP
Platform: Backend 4 / Platform 0
Vendor: the AdaptiveCpp project
Arch:
Driver version: 1.2
Is CPU: 1
Is GPU: 0
Default executor information:
Is in-order queue: 0
Is out-of-order queue: 1
Is task graph: 0
Device support queries:
images: 0
error_correction: 0
host_unified_memory: 1
little_endian: 1
global_mem_cache: 1
global_mem_cache_read_only: 0
global_mem_cache_read_write: 1
emulated_local_memory: 1
sub_group_independent_forward_progress: 0
usm_device_allocations: 1
usm_host_allocations: 1
usm_atomic_host_allocations: 1
usm_shared_allocations: 1
usm_atomic_shared_allocations: 1
usm_system_allocations: 1
execution_timestamps: 1
sscp_kernels: 0
Device properties:
max_compute_units: 24
max_global_size0: 18446744073709551615
max_global_size1: 18446744073709551615
max_global_size2: 18446744073709551615
max_group_size: 1024
max_num_sub_groups: 18446744073709551615
preferred_vector_width_char: 4
preferred_vector_width_double: 1
preferred_vector_width_float: 1
preferred_vector_width_half: 2
preferred_vector_width_int: 1
preferred_vector_width_long: 1
preferred_vector_width_short: 2
native_vector_width_char: 4
native_vector_width_double: 1
native_vector_width_float: 1
native_vector_width_half: 2
native_vector_width_int: 1
native_vector_width_long: 1
native_vector_width_short: 2
max_clock_speed: 0
max_malloc_size: 18446744073709551615
address_bits: 64
max_read_image_args: 0
max_write_image_args: 0
image2d_max_width: 0
image2d_max_height: 0
image3d_max_width: 0
image3d_max_height: 0
image3d_max_depth: 0
image_max_buffer_size: 0
image_max_array_size: 0
max_samplers: 0
max_parameter_size: 18446744073709551615
mem_base_addr_align: 8
global_mem_cache_line_size: 64
global_mem_cache_size: 1
global_mem_size: 33470476288
max_constant_buffer_size: 18446744073709551615
max_constant_args: 18446744073709551615
local_mem_size: 18446744073709551615
printf_buffer_size: 18446744073709551615
partition_max_sub_devices: 0
vendor_id: 18446744073709551615
sub_group_sizes: 1

***************** Devices for backend HIP *****************
Device 0:
General device information:
Name: AMD Radeon RX 7800 XT
Backend: HIP
Platform: Backend 1 / Platform 0
Vendor: AMD
Arch: gfx1101
Driver version: 60342134
Is CPU: 0
Is GPU: 1
Default executor information:
Is in-order queue: 0
Is out-of-order queue: 1
Is task graph: 0
Device support queries:
images: 0
error_correction: 0
host_unified_memory: 0
little_endian: 1
global_mem_cache: 1
global_mem_cache_read_only: 0
global_mem_cache_read_write: 1
emulated_local_memory: 0
sub_group_independent_forward_progress: 1
usm_device_allocations: 1
usm_host_allocations: 1
usm_atomic_host_allocations: 0
usm_shared_allocations: 1
usm_atomic_shared_allocations: 0
usm_system_allocations: 0
execution_timestamps: 1
sscp_kernels: 0
Device properties:
max_compute_units: 30
max_global_size0: 2199023254528
max_global_size1: 67108864
max_global_size2: 67108864
max_group_size: 1024
max_num_sub_groups: 32
preferred_vector_width_char: 4
preferred_vector_width_double: 1
preferred_vector_width_float: 1
preferred_vector_width_half: 2
preferred_vector_width_int: 1
preferred_vector_width_long: 1
preferred_vector_width_short: 2
native_vector_width_char: 4
native_vector_width_double: 1
native_vector_width_float: 1
native_vector_width_half: 2
native_vector_width_int: 1
native_vector_width_long: 1
native_vector_width_short: 2
max_clock_speed: 2169
max_malloc_size: 17163091968
address_bits: 64
max_read_image_args: 0
max_write_image_args: 0
image2d_max_width: 0
image2d_max_height: 0
image3d_max_width: 0
image3d_max_height: 0
image3d_max_depth: 0
image_max_buffer_size: 0
image_max_array_size: 0
max_samplers: 0
max_parameter_size: 18446744073709551615
mem_base_addr_align: 8
global_mem_cache_line_size: 128
global_mem_cache_size: 4194304
global_mem_size: 17163091968
max_constant_buffer_size: 2147483647
max_constant_args: 18446744073709551615
local_mem_size: 65536
printf_buffer_size: 18446744073709551615
partition_max_sub_devices: 0
vendor_id: 1022
sub_group_sizes: 32

Outputs rocm-smi:

======================================== ROCm System Management Interface ========================================
================================================== Concise Info ==================================================
Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU%
(DID, GUID) (Edge) (Avg) (Mem, Compute, ID)

0 1 0x747e, 43170 32.0°C 6.0W N/A, N/A, 0 1627Mhz 96Mhz 0% auto 220.0W 7% 32%

============================================== End of ROCm SMI Log ===============================================

outputs of gmx --version:

                     :-) GROMACS - gmx, 2025.2 (-:

Executable: /usr/local/gromacs/bin/gmx
Data prefix: /usr/local/gromacs
Working dir: /home/laqmedsom/Documentos/Daniel/colaboracao_joao_jean_leishmania_nitroreductase/MD
Command line:
gmx --version

GROMACS version: 2025.2
Precision: mixed
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 128)
GPU support: SYCL (AdaptiveCpp)
NBNxM GPU setup: super-cluster 2x2x2 / cluster 4 (cluster-pair splitting on)
SIMD instructions: AVX2_256
CPU FFT library: fftw-3.3.10-sse2-avx-avx2-avx2_128
GPU FFT library: VkFFT internal (1.3.1) with HIP backend
Multi-GPU FFT: none
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: hwloc-2.8.0
Tracing support: disabled
C compiler: /opt/rocm-6.3.4/lib/llvm/bin/clang Clang 18.0.0
C compiler flags: -mavx2 -mfma -Wno-missing-field-initializers -O3 -DNDEBUG
C++ compiler: /opt/rocm-6.3.4/lib/llvm/bin/clang++ Clang 18.0.0
C++ compiler flags: -mavx2 -mfma -Wno-reserved-identifier -Wno-missing-field-initializers -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-source-uses-openmp -Wno-c++17-extensions -Wno-documentation-unknown-command -Wno-covered-switch-default -Wno-switch-enum -Wno-switch-default -Wno-extra-semi-stmt -Wno-weak-vtables -Wno-shadow -Wno-padded -Wno-reserved-id-macro -Wno-double-promotion -Wno-exit-time-destructors -Wno-global-constructors -Wno-documentation -Wno-format-nonliteral -Wno-used-but-marked-unused -Wno-float-equal -Wno-cuda-compat -Wno-conditional-uninitialized -Wno-conversion -Wno-disabled-macro-expansion -Wno-unused-macros -Wno-unsafe-buffer-usage -Wno-unused-parameter -Wno-unused-variable -Wno-newline-eof -Wno-old-style-cast -Wno-zero-as-null-pointer-constant -Wno-unused-but-set-variable -Wno-sign-compare -Wno-unused-result -Wno-cast-function-type-strict SHELL:-fopenmp=libomp -O3 -DNDEBUG
BLAS library: External - detected on the system
LAPACK library: External - detected on the system
SYCL version: AdaptiveCpp 25.02.0+git.db23e2b0.20250524.branch.develop
SYCL compiler: /usr/local/lib/cmake/AdaptiveCpp/syclcc-launcher
SYCL compiler flags: -Wno-unknown-cuda-version -Wno-unknown-attributes --acpp-targets=“hip:gfx1101,gfx1101” --acpp-clang=/opt/rocm-6.3.4/lib/llvm/bin/clang++
SYCL GPU flags: -ffast-math -DHIPSYCL_ALLOW_INSTANT_SUBMISSION=1 -DACPP_ALLOW_INSTANT_SUBMISSION=1 -fgpu-inline-threshold=99999 -Wno-deprecated-declarations
SYCL targets: hip:gfx1101,gfx1101

I do not know what can I do to execute by GPU

Hi!

-DGMX_GPU_NB_CLUSTER_SIZE=8 is the correct setting for your GPU. Only Arc Alchemist and older Intel GPUs need 4. Please rebuild GROMACS with -DGMX_GPU_NB_CLUSTER_SIZE=8 and try again.

I have a quick observation here. From the post of @l0vepe0ple, I have observed that both the integrated GPU and the discrete GPU (graphic card) are being used. @l0vepe0ple can you do a short mdrun and confirm that your dGPU is being preferred over the iGPU? I had this problem while installing drivers for the latest 9070XT. I had to go into the BIOS and disable the use of iGPU to ensure that linux is using the dGPU and installing the required drivers to it?