GROMACS version: 2025.1
GROMACS modification: No
Hi, everyone!
Ive got a strange problem - my GROMACS cant detect GPU (AMD).
l0vepe0ple@l0vepe0ple-Bravo-15-B7ED:~/labor/ChA/wthtEditing/SLC22A12/9B1L$ gmx mdrun -v -deffnm md_0_10_Ref -nb gpu
:-) GROMACS - gmx mdrun, 2025.1 (-:
Executable: /usr/local/gromacs/bin/gmx
Data prefix: /usr/local/gromacs
Working dir: /home/l0vepe0ple/labor/ChA/wthtEditing/SLC22A12/9B1L
Command line:
gmx mdrun -v -deffnm md_0_10_Ref -nb gpu
Back Off! I just backed up md_0_10_Ref.log to ./#md_0_10_Ref.log.19#
Reading file md_0_10_Ref.tpr, VERSION 2025.1 (single precision)
Changing nstlist from 20 to 100, rlist from 1.321 to 1.493
-------------------------------------------------------
Program: gmx mdrun, version 2025.1
Source file: src/gromacs/taskassignment/findallgputasks.cpp (line 88)
Fatal error:
Cannot run short-ranged nonbonded interactions on a GPU because no GPU is
detected.
For more information and tips for troubleshooting, please check the GROMACS
website at https://manual.gromacs.org/current/user-guide/run-time-errors.html
full gmx info:
l0vepe0ple@l0vepe0ple-Bravo-15-B7ED:~/labor/ChA/wthtEditing/SLC22A12/9B1L$ gmx --version
:-) GROMACS - gmx, 2025.1 (-:
Executable: /usr/local/gromacs/bin/gmx
Data prefix: /usr/local/gromacs
Working dir: /home/l0vepe0ple/labor/ChA/wthtEditing/SLC22A12/9B1L
Command line:
gmx --version
GROMACS version: 2025.1
Precision: mixed
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 128)
GPU support: SYCL (AdaptiveCpp)
NBNxM GPU setup: super-cluster 2x2x2 / cluster 8 (cluster-pair splitting off)
SIMD instructions: AVX2_256
CPU FFT library: fftw-3.3.10-sse2-avx-avx2-avx2_128
GPU FFT library: VkFFT internal (1.3.1) with HIP backend
Multi-GPU FFT: none
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
C compiler: /usr/bin/clang Clang 18.1.3
C compiler flags: -mavx2 -mfma -Wno-missing-field-initializers -O3 -DNDEBUG
C++ compiler: /usr/bin/clang++ Clang 18.1.3
C++ compiler flags: -mavx2 -mfma -Wno-reserved-identifier -Wno-missing-field-initializers -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-source-uses-openmp -Wno-c++17-extensions -Wno-documentation-unknown-command -Wno-covered-switch-default -Wno-switch-enum -Wno-switch-default -Wno-extra-semi-stmt -Wno-weak-vtables -Wno-shadow -Wno-padded -Wno-reserved-id-macro -Wno-double-promotion -Wno-exit-time-destructors -Wno-global-constructors -Wno-documentation -Wno-format-nonliteral -Wno-used-but-marked-unused -Wno-float-equal -Wno-cuda-compat -Wno-conditional-uninitialized -Wno-conversion -Wno-disabled-macro-expansion -Wno-unused-macros -Wno-unsafe-buffer-usage -Wno-unused-parameter -Wno-unused-variable -Wno-newline-eof -Wno-old-style-cast -Wno-zero-as-null-pointer-constant -Wno-unused-but-set-variable -Wno-sign-compare -Wno-unused-result -Wno-cast-function-type-strict SHELL:-fopenmp=libomp -O3 -DNDEBUG
BLAS library: External - detected on the system
LAPACK library: External - detected on the system
SYCL version: AdaptiveCpp 24.10.0+git.29fe4c1f.20250319.branch.develop.dirty
SYCL compiler: /usr/local/adaptivecpp24_10/lib/cmake/AdaptiveCpp/syclcc-launcher
SYCL compiler flags: -Wno-unknown-cuda-version -Wno-unknown-attributes --acpp-clang=/usr/bin/clang++
SYCL GPU flags: -ffast-math -DHIPSYCL_ALLOW_INSTANT_SUBMISSION=1 -DACPP_ALLOW_INSTANT_SUBMISSION=1 -fgpu-inline-threshold=99999 -Wno-deprecated-declarations
SYCL targets:
l0vepe0ple@l0vepe0ple-Bravo-15-B7ED:~$ lspci -k | grep -EA3 ‘VGA|3D|Display’
03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 24 [Radeon RX 6400/6500 XT/6500M] (rev c8)
Subsystem: Micro-Star International Co., Ltd. [MSI] Navi 24 [Radeon RX 6400/6500 XT/6500M]
Kernel driver in use: amdgpu
Kernel modules: amdgpu
08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt [Radeon 680M] (rev 0b)
Subsystem: Micro-Star International Co., Ltd. [MSI] Rembrandt [Radeon 680M]
Kernel driver in use: amdgpu
Kernel modules: amdgpu
File from src/gromacs/taskassignment/findallgputasks.cpp
/*
* This file is part of the GROMACS molecular simulation package.
*
* Copyright 2017- The GROMACS Authors
* and the project initiators Erik Lindahl, Berk Hess and David van der Spoel.
* Consult the AUTHORS/COPYING files and https://www.gromacs.org for details.
*
* GROMACS is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public License
* as published by the Free Software Foundation; either version 2.1
* of the License, or (at your option) any later version.
*
* GROMACS is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with GROMACS; if not, see
* https://www.gnu.org/licenses, or write to the Free Software Foundation,
* Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
*
* If you want to redistribute modifications to GROMACS, please
* consider that scientific software is very special. Version
* control is crucial - bugs must be traceable. We will be happy to
* consider code for inclusion in the official distribution, but
* derived work must not be called official GROMACS. Details are found
* in the README & COPYING files - if they are missing, get the
* official version at https://www.gromacs.org.
*
* To help us fund GROMACS development, we humbly ask that you cite
* the research papers on the package. Check out https://www.gromacs.org.
*/
/*! \internal \file
* \brief
* Defines routine for collecting all GPU tasks found on ranks of a node.
*
* \author Mark Abraham <mark.j.abraham@gmail.com>
* \ingroup module_taskassignment
*/
#include "gmxpre.h"
#include "findallgputasks.h"
#include "config.h"
#include <filesystem>
#include <iterator>
#include <numeric>
#include <type_traits>
#include <vector>
#include "gromacs/taskassignment/decidegpuusage.h"
#include "gromacs/taskassignment/taskassignment.h"
#include "gromacs/utility/arrayref.h"
#include "gromacs/utility/exceptions.h"
#include "gromacs/utility/fatalerror.h"
#include "gromacs/utility/gmxassert.h"
#include "gromacs/utility/gmxmpi.h"
#include "gromacs/utility/physicalnodecommunicator.h"
namespace gmx
{
std::vector<GpuTask> findGpuTasksOnThisRank(const bool haveGpusOnThisPhysicalNode,
const TaskTarget nonbondedTarget,
const TaskTarget pmeTarget,
const TaskTarget bondedTarget,
const TaskTarget updateTarget,
const bool useGpuForNonbonded,
const bool useGpuForPme,
const bool rankHasPpTask,
const bool rankHasPmeTask)
{
std::vector<GpuTask> gpuTasksOnThisRank;
if (rankHasPpTask)
{
if (useGpuForNonbonded)
{
// Note that any bonded tasks on a GPU always accompany a
// non-bonded task.
if (haveGpusOnThisPhysicalNode)
{
gpuTasksOnThisRank.push_back(GpuTask::Nonbonded);
}
else if (nonbondedTarget == TaskTarget::Gpu)
{
gmx_fatal(FARGS,
"Cannot run short-ranged nonbonded interactions on a GPU because no GPU "
"is detected.");
}
else if (bondedTarget == TaskTarget::Gpu)
{
gmx_fatal(FARGS,
"Cannot run bonded interactions on a GPU because no GPU is detected.");
}
else if (updateTarget == TaskTarget::Gpu)
{
gmx_fatal(FARGS,
"Cannot run coordinate update on a GPU because no GPU is detected.");
}
}
}
if (rankHasPmeTask)
{
if (useGpuForPme)
{
if (haveGpusOnThisPhysicalNode)
{
gpuTasksOnThisRank.push_back(GpuTask::Pme);
}
else if (pmeTarget == TaskTarget::Gpu)
{
gmx_fatal(FARGS, "Cannot run PME on a GPU because no GPU is detected.");
}
}
}
return gpuTasksOnThisRank;
}
namespace
{
//! Constant used to help minimize preprocessing of code.
constexpr bool g_usingMpi = GMX_MPI;
//! Helper function to prepare to all-gather the vector of non-bonded tasks on this node.
std::vector<int> allgather(const int& input, int numRanks, MPI_Comm communicator)
{
std::vector<int> result(numRanks);
if (g_usingMpi && numRanks > 1)
{
// TODO This works as an MPI_Allgather, but thread-MPI does
// not implement that. It's only intra-node communication, and
// happens rarely, so not worth optimizing (yet). Also
// thread-MPI segfaults with 1 rank.
#if GMX_MPI
int root = 0;
// Calling a C API with the const T * from data() doesn't seem
// to compile warning-free with all versions of MPI headers.
//
// TODO Make an allgather template to deal with this nonsense.
MPI_Gather(const_cast<int*>(&input), 1, MPI_INT, const_cast<int*>(result.data()), 1, MPI_INT, root, communicator);
MPI_Bcast(const_cast<int*>(result.data()), result.size(), MPI_INT, root, communicator);
#else
GMX_UNUSED_VALUE(communicator);
#endif
}
else
{
result[0] = input;
}
return result;
}
//! Helper function to compute allgatherv displacements.
std::vector<int> computeDisplacements(ArrayRef<const int> extentOnEachRank, int numRanks)
{
std::vector<int> displacements(numRanks + 1);
displacements[0] = 0;
std::partial_sum(
std::begin(extentOnEachRank), std::end(extentOnEachRank), std::begin(displacements) + 1);
return displacements;
}
//! Helper function to all-gather the vector of all GPU tasks on ranks of this node.
std::vector<GpuTask> allgatherv(ArrayRef<const GpuTask> input,
ArrayRef<const int> extentOnEachRank,
ArrayRef<const int> displacementForEachRank,
MPI_Comm communicator)
{
// Now allocate the vector and do the allgatherv
int totalExtent = displacementForEachRank.back();
std::vector<GpuTask> result;
result.reserve(totalExtent);
if (g_usingMpi && extentOnEachRank.size() > 1 && totalExtent > 0)
{
result.resize(totalExtent);
// TODO This works as an MPI_Allgatherv, but thread-MPI does
// not implement that. It's only intra-node communication, and
// happens rarely, so not worth optimizing (yet). Also
// thread-MPI segfaults with 1 rank and with zero totalExtent.
#if GMX_MPI
int root = 0;
MPI_Gatherv(reinterpret_cast<std::underlying_type_t<GpuTask>*>(const_cast<GpuTask*>(input.data())),
input.size(),
MPI_INT,
reinterpret_cast<std::underlying_type_t<GpuTask>*>(result.data()),
const_cast<int*>(extentOnEachRank.data()),
const_cast<int*>(displacementForEachRank.data()),
MPI_INT,
root,
communicator);
MPI_Bcast(reinterpret_cast<std::underlying_type_t<GpuTask>*>(result.data()),
result.size(),
MPI_INT,
root,
communicator);
#else
GMX_UNUSED_VALUE(communicator);
#endif
}
else
{
for (const auto& gpuTask : input)
{
result.push_back(gpuTask);
}
}
return result;
}
} // namespace
/*! \brief Returns container of all tasks on all ranks of this node
* that are eligible for GPU execution.
*
* Perform all necessary communication for preparing for task
* assignment. Separating this aspect makes it possible to unit test
* the logic of task assignment. */
GpuTasksOnRanks findAllGpuTasksOnThisNode(ArrayRef<const GpuTask> gpuTasksOnThisRank,
const PhysicalNodeCommunicator& physicalNodeComm)
{
int numRanksOnThisNode = physicalNodeComm.size_;
MPI_Comm communicator = physicalNodeComm.comm_;
// Find out how many GPU tasks are on each rank on this node.
auto numGpuTasksOnEachRankOfThisNode =
allgather(gpuTasksOnThisRank.size(), numRanksOnThisNode, communicator);
/* Collect on each rank of this node a vector describing all
* GPU tasks on this node, in ascending order of rank. This
* requires a vector allgather. The displacements indicate where
* the GPU tasks on each rank of this node start and end within
* the vector. */
auto displacementsForEachRank =
computeDisplacements(numGpuTasksOnEachRankOfThisNode, numRanksOnThisNode);
auto gpuTasksOnThisNode = allgatherv(
gpuTasksOnThisRank, numGpuTasksOnEachRankOfThisNode, displacementsForEachRank, communicator);
/* Next, we re-use the displacements to break up the vector
* of GPU tasks into something that can be indexed like
* gpuTasks[rankIndex][taskIndex]. */
GpuTasksOnRanks gpuTasksOnRanksOfThisNode;
// TODO This would be nicer if we had a good abstraction for "pair
// of iterators that point to adjacent container elements" or
// "iterator that points to the first of a pair of valid adjacent
// container elements, or end".
GMX_ASSERT(displacementsForEachRank.size() > 1,
"Even with one rank, there's always both a start and end displacement");
auto currentDisplacementIt = displacementsForEachRank.begin();
auto nextDisplacementIt = currentDisplacementIt + 1;
do
{
gpuTasksOnRanksOfThisNode.emplace_back();
for (auto taskOnThisRankIndex = *currentDisplacementIt; taskOnThisRankIndex != *nextDisplacementIt;
++taskOnThisRankIndex)
{
gpuTasksOnRanksOfThisNode.back().push_back(gpuTasksOnThisNode[taskOnThisRankIndex]);
}
currentDisplacementIt = nextDisplacementIt;
++nextDisplacementIt;
} while (nextDisplacementIt != displacementsForEachRank.end());
return gpuTasksOnRanksOfThisNode;
}
} // namespace gmx
rocminfo:
l0vepe0ple@l0vepe0ple-Bravo-15-B7ED:~/labor/ChA/wthtEditing/SLC22A12/9B1L$ rocminfo
ROCk module version 6.10.5 is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.14
Runtime Ext Version: 1.6
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD Ryzen 5 7535HS with Radeon Graphics
Uuid: CPU-XX
Marketing Name: AMD Ryzen 5 7535HS with Radeon Graphics
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 4603
BDFID: 0
Internal Node ID: 0
Compute Unit: 12
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Memory Properties:
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 32039884(0x1e8e3cc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 32039884(0x1e8e3cc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 32039884(0x1e8e3cc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 4
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 32039884(0x1e8e3cc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: gfx1035
Uuid: GPU-XX
Marketing Name: AMD Radeon Graphics
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 1024(0x400) KB
L3: 16384(0x4000) KB
Chip ID: 29759(0x743f)
ASIC Revision: 0(0x0)
Cacheline Size: 128(0x80)
Max Clock Freq. (MHz): 2770
BDFID: 768
Internal Node ID: 1
Compute Unit: 16
SIMDs per CU: 2
Shader Engines: 1
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Memory Properties:
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 120
SDMA engine uCode:: 34
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 4177920(0x3fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 4177920(0x3fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1035
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*******
Agent 3
*******
Name: gfx1035
Uuid: GPU-XX
Marketing Name: AMD Radeon Graphics
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 2
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 2048(0x800) KB
Chip ID: 5761(0x1681)
ASIC Revision: 2(0x2)
Cacheline Size: 128(0x80)
Max Clock Freq. (MHz): 1899
BDFID: 2048
Internal Node ID: 2
Compute Unit: 6
SIMDs per CU: 2
Shader Engines: 1
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Memory Properties: APU
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 118
SDMA engine uCode:: 47
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16019940(0xf471e4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 16019940(0xf471e4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1035
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***