GROMACS get stuck AMD GPU

GROMACS version: 2023.1
GROMACS modification: No
GPU: Radeon RX 7900 XT
Ubuntu 22
CPU: Core i7

Hello,

I compiled GROMACS with OpenSYCL 0.9.4 and ROCm 5.5.1 (AMD repositories), no errors in compilation but when i run GROMACS just get stuck, no errors in files but top show activity and rocm-smi too.

When i run GROMACS with this variable:

HIPSYCL_DEBUG_LEVEL=4

Show this:

[hipSYCL Info] dag_manager [async]: Submitting node to scheduler!
[hipSYCL Info] inorder_executor: Processing node 0x7a8060 with 1 non-virtual requirement(s) and 1 direct requirement(s).
[hipSYCL Info]  --> (Skipping same-lane synchronization with node: 0x7e2210)
[hipSYCL Info] inorder_executor: Dispatching to lane 0x1222e10: kernel: ZN11DeviceEvent4markERK12DeviceStreamEUlRN7hipsycl4sycl14interop_handleEE_
[hipSYCL Info] dag_manager [async]: DAG flush complete.

hipsycl-info won’t detect GPU but in some forums says it should works:

[hipSYCL Warning] **backend_loader: Could not load backend plugin: /usr/local/lib/hipSYCL/librt-backend-hip.so**
[hipSYCL Warning] /usr/local/lib/hipSYCL/**librt-backend-hip.so: undefined symbol: hipEventQuery**
=================Backend information===================
Loaded backend 0: OpenMP
  Found device: hipSYCL OpenMP host device

=================Device information===================
***************** Devices for backend OpenMP *****************
Device 0:
 General device information:
  Name: hipSYCL OpenMP host device
  Backend: OpenMP
  Vendor: the hipSYCL project
  Arch: <native-cpu>
  Driver version: 1.2
  Is CPU: 1
  **Is GPU: 0**
 Default executor information:
  Is in-order queue: 1
  Is out-of-order queue: 0
  Is task graph: 0
 Device support queries:
  images: 0
  error_correction: 0
  host_unified_memory: 1
  little_endian: 1
  global_mem_cache: 1
  global_mem_cache_read_only: 0
  global_mem_cache_read_write: 1
  emulated_local_memory: 1
  sub_group_independent_forward_progress: 0
  usm_device_allocations: 1
  usm_host_allocations: 1
  usm_atomic_host_allocations: 1
  usm_shared_allocations: 1
  usm_atomic_shared_allocations: 1
  usm_system_allocations: 1
  execution_timestamps: 1
 Device properties:
  max_compute_units: 8
  max_global_size0: 18446744073709551615
  max_global_size1: 18446744073709551615
  max_global_size2: 18446744073709551615
  max_group_size: 1024
  max_num_sub_groups: 18446744073709551615
  preferred_vector_width_char: 4
  preferred_vector_width_double: 1
  preferred_vector_width_float: 1
  preferred_vector_width_half: 2
  preferred_vector_width_int: 1
  preferred_vector_width_long: 1
  preferred_vector_width_short: 2
  native_vector_width_char: 4
  native_vector_width_double: 1
  native_vector_width_float: 1
  native_vector_width_half: 2
  native_vector_width_int: 1
  native_vector_width_long: 1
  native_vector_width_short: 2
  max_clock_speed: 0
  max_malloc_size: 18446744073709551615
  address_bits: 64
  max_read_image_args: 0
  max_write_image_args: 0
  image2d_max_width: 0
  image2d_max_height: 0
  image3d_max_width: 0
  image3d_max_height: 0
  image3d_max_depth: 0
  image_max_buffer_size: 0
  image_max_array_size: 0
  max_samplers: 0
  max_parameter_size: 18446744073709551615
  mem_base_addr_align: 8
  global_mem_cache_line_size: 64
  global_mem_cache_size: 1
  global_mem_size: 18446744073709551615
  max_constant_buffer_size: 18446744073709551615
  max_constant_args: 18446744073709551615
  local_mem_size: 18446744073709551615
  printf_buffer_size: 18446744073709551615
  partition_max_sub_devices: 0
  vendor_id: 18446744073709551615
  sub_group_sizes: 1 

Compiling OpenSYCL with this parameters:

cmake .. -DCMAKE_C_COMPILER=/opt/rocm-5.5.1/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm-5.5.1/llvm/bin/clang++ -DLLVM_DIR=/opt/rocm-5.5.1/llvm/lib/cmake/llvm -DROCM_PATH=/opt/rocm-5.5.1

And GROMACS:

AMDGPU_TARGETS=gfx1100 cmake .. -DGMX_GPU=SYCL -DGMX_SYCL_HIPSYCL=ON  -DHIPSYCL_TARGETS='hip:gfx1100' -DCMAKE_C_COMPILER=/opt/rocm-5.5.1/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm-5.5.1/llvm/bin/clang++ -DLLVM_DIR=/opt/rocm-5.5.1/llvm/lib/cmake/llvm -DROCM_PATH=/opt/rocm-5.5.1

I also added some flags but most of the time gives me core dumped:

-DWITH_ROCM_BACKEND=ON
-DHIPSYCL_NO_DEVICE_MANGLER=ON
-DWITH_ACCELERATED_CPU=off
-DHIPSYCL_SYCLCC_EXTRA_ARGS='-ffast-math'
-D_GLIBCXX_USE_CXX11_ABI=0

I tested with ROCm 4.x and GROMACS 22 with same result.

Any ideas?

Thank you

1 Like

Hi!

  1. Word of warning: ROCm 5.5.1 does not officially support any RDNA3 devices. AMD is really not good at supporting compute workloads on their consumer GPUs. It has chances of working, but you might be the first person trying GROMACS on this hardware :)

  2. Since hipSYCL/OpenSYCL does not detect the GPU, that’s where we should start. If you run the following command, will it work?

LD_LIBRARY_PATH="/opt/rocm-5.5.1/lib:$LD_LIBRARY_PATH" hipsycl-info

If it still does not work, looking at rocm-smi output might be enlightening.

  1. That said, GROMACS apparently can use the GPU (it automatically does a thing similar to the what LD_LIBRARY_PATH does in the suggestion above), or it would just outright complain about the lack of devices. Running with HIPSYCL_DEBUG_LEVEL=4 was a right step, but could you please attach a full output (both on the terminal and the md.log file? Could you also try running with HIPSYCL_RT_MAX_CACHED_NODES=0 HIPSYCL_DEBUG_LEVEL=4?

  2. Note: Nothing wrong with how you were building GROMACS, but the following shorter version should be enough:

cmake .. -DGMX_GPU=SYCL -DGMX_SYCL_HIPSYCL=ON -DHIPSYCL_TARGETS='hip:gfx1100' -DCMAKE_C_COMPILER=/opt/rocm-5.5.1/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm-5.5.1/llvm/bin/clang++

Thank you.

Running this:

LD_LIBRARY_PATH="/opt/rocm-5.5.1/lib:$LD_LIBRARY_PATH" hipsycl-info

I get the same error:

[hipSYCL Warning] **backend_loader: Could not load backend plugin: /usr/local/lib/hipSYCL/librt-backend-hip.so**
[hipSYCL Warning] /usr/local/lib/hipSYCL/**librt-backend-hip.so: undefined symbol: hipEventQuery**
=================Backend information===================
Loaded backend 0: OpenMP
  Found device: hipSYCL OpenMP host device

=================Device information===================
***************** Devices for backend OpenMP *****************
Device 0:
 General device information:
  Name: hipSYCL OpenMP host device
  Backend: OpenMP
  Vendor: the hipSYCL project
  Arch: <native-cpu>
  Driver version: 1.2
  Is CPU: 1
  **Is GPU: 0**
.....

rocm-smi:

======================= ROCm System Management Interface =======================
================================= Concise Info =================================
GPU  Temp (DieEdge)  AvgPwr  SCLK    MCLK    Fan     Perf  PwrCap  VRAM%  GPU%  
0    64.0c           60.0W   719Mhz  456Mhz  15.69%  auto  265.0W    0%   0%   
================================================================================
============================= End of ROCm SMI Log ==============================

md.log

                      :-) GROMACS - gmx mdrun, 2023.1 (-:

Copyright 1991-2023 The GROMACS Authors.
GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.

                         Current GROMACS contributors:
       Mark Abraham           Andrey Alekseenko           Cathrine Bergh      
      Christian Blau            Eliane Briand             Mahesh Doijade      
    Stefan Fleischmann           Vytas Gapsys              Gaurav Garg        
      Sergey Gorelov         Gilles Gouaillardet            Alan Gray         
     M. Eric Irrgang         Farzaneh Jalalypour            Joe Jordan        
    Christoph Junghans        Prashanth Kanduri          Sebastian Keller     
     Carsten Kutzner           Justin A. Lemkul          Magnus Lundborg      
       Pascal Merz              Vedran Miletic            Dmitry Morozov      
       Szilard Pall             Roland Schulz             Michael Shirts      
     Alexey Shvetsov            Balint Soproni         David van der Spoel    
      Philip Turner             Carsten Uphoff           Alessandra Villa     
 Sebastian Wingbermuehle        Artem Zhmurov       

                         Previous GROMACS contributors:
        Emile Apol             Rossen Apostolov           James Barnett       
  Herman J.C. Berendsen          Par Bjelkmar           Viacheslav Bolnykh    
        Kevin Boyd            Aldert van Buuren          Carlo Camilloni      
     Rudi van Drunen            Anton Feenstra           Oliver Fleetwood     
     Gerrit Groenhof            Bert de Groot             Anca Hamuraru       
    Vincent Hindriksen          Victor Holanda           Aleksei Iupinov      
   Dimitrios Karkoulis           Peter Kasson             Sebastian Kehl      
        Jiri Kraus               Per Larsson              Viveca Lindahl      
      Erik Marklund           Pieter Meulenhoff           Teemu Murtola       
       Sander Pronk             Alfons Sijbers            Peter Tieleman      
       Jon Vincent             Teemu Virolainen         Christian Wennberg    
       Maarten Wolf       

                  Coordinated by the GROMACS project leaders:
                    Paul Bauer, Berk Hess, and Erik Lindahl

GROMACS:      gmx mdrun, version 2023.1
Executable:   /usr/local/apps/gromacs-2023.1/build/bin/gmx
Data prefix:  /usr/local/apps/gromacs-2023.1 (source tree)
Working dir:  /root/2and3dimethylbutane
Process ID:   3997
Command line:
  gmx mdrun

GROMACS version:    2023.1
Precision:          mixed
Memory model:       64 bit
MPI library:        thread_mpi
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 128)
GPU support:        SYCL (hipSYCL)
NB cluster size:    8
SIMD instructions:  AVX_256
CPU FFT library:    fftw-3.3.8-sse2-avx
GPU FFT library:    VkFFT internal (1.2.26-b15cb0ca3e884bdb6c901a12d87aa8aadf7637d8) with HIP backend
Multi-GPU FFT:      none
RDTSCP usage:       enabled
TNG support:        enabled
Hwloc support:      disabled
Tracing support:    disabled
C compiler:         /opt/rocm-5.5.1/llvm/bin/clang Clang 16.0.0
C compiler flags:   -mavx -Wno-missing-field-initializers -g
C++ compiler:       /opt/rocm-5.5.1/llvm/bin/clang++ Clang 16.0.0
C++ compiler flags: -mavx -Wno-reserved-identifier -Wno-missing-field-initializers -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-source-uses-openmp -Wno-c++17-extensions -Wno-documentation-unknown-command -Wno-covered-switch-default -Wno-switch-enum -Wno-extra-semi-stmt -Wno-weak-vtables -Wno-shadow -Wno-padded -Wno-reserved-id-macro -Wno-double-promotion -Wno-exit-time-destructors -Wno-global-constructors -Wno-documentation -Wno-format-nonliteral -Wno-used-but-marked-unused -Wno-float-equal -Wno-cuda-compat -Wno-conditional-uninitialized -Wno-conversion -Wno-disabled-macro-expansion -Wno-unused-macros -Wno-unused-parameter -Wno-unused-variable -Wno-newline-eof -Wno-old-style-cast -Wno-zero-as-null-pointer-constant -Wno-unused-but-set-variable -Wno-sign-compare -Wno-unused-result -Wno-cast-function-type-strict -fopenmp=libomp -g
BLAS library:       External - detected on the system
LAPACK library:     External - detected on the system
hipSYCL launcher:   /usr/local/lib/cmake/hipSYCL/syclcc-launcher
hipSYCL flags:      -Wno-unknown-cuda-version -Wno-unknown-attributes  --hipsycl-targets="hip:gfx1100"
hipSYCL GPU flags:  -ffast-math;-fgpu-inline-threshold=99999
hipSYCL targets:    hip:gfx1100
hipSYCL version:    hipSYCL 0.9.4-git


Running on 1 node with total 4 cores, 8 processing units, 1 compatible GPU
Hardware detected on host hdomamd01:
  CPU info:
    Vendor: Intel
    Brand:  Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
    Family: 6   Model: 58   Stepping: 9
    Features: aes apic avx clfsh cmov cx8 cx16 f16c htt intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
  Hardware topology: Basic
    Packages, cores, and logical processors:
    [indices refer to OS logical processors]
      Package  0: [   0   4] [   1   5] [   2   6] [   3   7]
    CPU limit set by OS: -1   Recommended max number of threads: 8
  GPU info:
    Number of GPUs detected: 1
    #0: name: Radeon RX 7900 XT, architecture 11.0.0, vendor: AMD, device version: 1.2 hipSYCL 0.9.4-git, driver version 50530202, status: compatible


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E.
Lindahl
GROMACS: High performance molecular simulations through multi-level
parallelism from laptops to supercomputers
SoftwareX 1 (2015) pp. 19-25
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Páll, M. J. Abraham, C. Kutzner, B. Hess, E. Lindahl
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with
GROMACS
In S. Markidis & E. Laure (Eds.), Solving Software Challenges for Exascale 8759 (2015) pp. 3-27
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R.
Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E. Lindahl
GROMACS 4.5: a high-throughput and highly parallel open source molecular
simulation toolkit
Bioinformatics 29 (2013) pp. 845-54
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 435-447
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
Berendsen
GROMACS: Fast, Flexible and Free
J. Comp. Chem. 26 (2005) pp. 1701-1719
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
E. Lindahl and B. Hess and D. van der Spoel
GROMACS 3.0: A package for molecular simulation and trajectory analysis
J. Mol. Mod. 7 (2001) pp. 306-317
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp. Phys. Comm. 91 (1995) pp. 43-56
-------- -------- --- Thank You --- -------- --------


++++ PLEASE CITE THE DOI FOR THIS VERSION OF GROMACS ++++
https://doi.org/10.5281/zenodo.7852175
-------- -------- --- Thank You --- -------- --------

Input Parameters:
   integrator                     = md
   tinit                          = 0
   dt                             = 0.002
   nsteps                         = 50000
   init-step                      = 0
   simulation-part                = 1
   mts                            = false
   comm-mode                      = Linear
   nstcomm                        = 100
   bd-fric                        = 0
   ld-seed                        = -714788881
   emtol                          = 10
   emstep                         = 0.01
   niter                          = 20
   fcstep                         = 0
   nstcgsteep                     = 1000
   nbfgscorr                      = 10
   rtpi                           = 0.05
   nstxout                        = 1000
   nstvout                        = 1000
   nstfout                        = 0
   nstlog                         = 1000
   nstcalcenergy                  = 100
   nstenergy                      = 1000
   nstxout-compressed             = 0
   compressed-x-precision         = 1000
   cutoff-scheme                  = Verlet
   nstlist                        = 10
   pbc                            = xyz
   periodic-molecules             = false
   verlet-buffer-tolerance        = 0.005
   rlist                          = 1
   coulombtype                    = PME
   coulomb-modifier               = Potential-shift
   rcoulomb-switch                = 0
   rcoulomb                       = 1
   epsilon-r                      = 1
   epsilon-rf                     = inf
   vdw-type                       = Cut-off
   vdw-modifier                   = Potential-shift
   rvdw-switch                    = 0
   rvdw                           = 1
   DispCorr                       = No
   table-extension                = 1
   fourierspacing                 = 0.16
   fourier-nx                     = 32
   fourier-ny                     = 32
   fourier-nz                     = 32
   pme-order                      = 4
   ewald-rtol                     = 1e-05
   ewald-rtol-lj                  = 0.001
   lj-pme-comb-rule               = Geometric
   ewald-geometry                 = 3d
   epsilon-surface                = 0
   ensemble-temperature-setting   = constant
   ensemble-temperature           = 298
   tcoupl                         = V-rescale
   nsttcouple                     = 100
   nh-chain-length                = 0
   print-nose-hoover-chain-variables = false
   pcoupl                         = No
   pcoupltype                     = Isotropic
   nstpcouple                     = -1
   tau-p                          = 1
   compressibility (3x3):
      compressibility[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      compressibility[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      compressibility[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
   ref-p (3x3):
      ref-p[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      ref-p[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      ref-p[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
   refcoord-scaling               = No
   posres-com (3):
      posres-com[0]= 0.00000e+00
      posres-com[1]= 0.00000e+00
      posres-com[2]= 0.00000e+00
   posres-comB (3):
      posres-comB[0]= 0.00000e+00
      posres-comB[1]= 0.00000e+00
      posres-comB[2]= 0.00000e+00
   QMMM                           = false
qm-opts:
   ngQM                           = 0
   constraint-algorithm           = Lincs
   continuation                   = true
   Shake-SOR                      = false
   shake-tol                      = 0.0001
   lincs-order                    = 4
   lincs-iter                     = 1
   lincs-warnangle                = 30
   nwall                          = 0
   wall-type                      = 9-3
   wall-r-linpot                  = -1
   wall-atomtype[0]               = -1
   wall-atomtype[1]               = -1
   wall-density[0]                = 0
   wall-density[1]                = 0
   wall-ewald-zfac                = 3
   pull                           = false
   awh                            = false
   rotation                       = false
   interactiveMD                  = false
   disre                          = No
   disre-weighting                = Conservative
   disre-mixed                    = false
   dr-fc                          = 1000
   dr-tau                         = 0
   nstdisreout                    = 100
   orire-fc                       = 0
   orire-tau                      = 0
   nstorireout                    = 100
   free-energy                    = no
   cos-acceleration               = 0
   deform (3x3):
      deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
   simulated-tempering            = false
   swapcoords                     = no
   userint1                       = 0
   userint2                       = 0
   userint3                       = 0
   userint4                       = 0
   userreal1                      = 0
   userreal2                      = 0
   userreal3                      = 0
   userreal4                      = 0
   applied-forces:
     electric-field:
       x:
         E0                       = 0
         omega                    = 0
         t0                       = 0
         sigma                    = 0
       y:
         E0                       = 0
         omega                    = 0
         t0                       = 0
         sigma                    = 0
       z:
         E0                       = 0
         omega                    = 0
         t0                       = 0
         sigma                    = 0
     density-guided-simulation:
       active                     = false
       group                      = protein
       similarity-measure         = inner-product
       atom-spreading-weight      = unity
       force-constant             = 1e+09
       gaussian-transform-spreading-width = 0.2
       gaussian-transform-spreading-range-in-multiples-of-width = 4
       reference-density-filename = reference.mrc
       nst                        = 1
       normalize-densities        = true
       adaptive-force-scaling     = false
       adaptive-force-scaling-time-constant = 4
       shift-vector               = 
       transformation-matrix      = 
     qmmm-cp2k:
       active                     = false
       qmgroup                    = System
       qmmethod                   = PBE
       qmfilenames                = 
       qmcharge                   = 0
       qmmultiplicity             = 1
grpopts:
   nrdf:        6497
   ref-t:         298
   tau-t:           1
annealing:          No
annealing-npoints:           0
   acc:	           0           0           0
   nfreeze:           N           N           N
   energygrp-flags[  0]: 0

Changing nstlist from 10 to 100, rlist from 1 to 1.081

Update groups can not be used for this system because there are three or more consecutively coupled constraints

NOTE: SYCL GPU support in GROMACS, and the compilers, libraries,
and drivers that it depends on are fairly new.
Please, pay extra attention to the correctness of your results,
and update to the latest GROMACS patch version if warranted.

1 GPU selected for this run.
Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node:
  PP:0,PME:0
PP tasks will do (non-perturbed) short-ranged interactions on the GPU
PP task will update and constrain coordinates on the GPU
PME tasks will do all aspects on the GPU
Using 1 MPI thread
Using 4 OpenMP threads 

System total charge: 0.000
Will do PME sum in reciprocal space for electrostatic interactions.

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen 
A smooth particle mesh Ewald method
J. Chem. Phys. 103 (1995) pp. 8577-8592
-------- -------- --- Thank You --- -------- --------

Using a Gaussian width (1/beta) of 0.320163 nm for Ewald
Potential shift: LJ r^-12: -1.000e+00 r^-6: -1.000e+00, Ewald -1.000e-05
Initialized non-bonded Coulomb Ewald tables, spacing: 9.33e-04 size: 1073

Generated table with 1040 data points for 1-4 COUL.
Tabscale = 500 points/nm
Generated table with 1040 data points for 1-4 LJ6.
Tabscale = 500 points/nm
Generated table with 1040 data points for 1-4 LJ12.
Tabscale = 500 points/nm


Using GPU 8x8 nonbonded short-range kernels

Using a dual 8x8 pair-list setup updated with dynamic, rolling pruning:
  outer list: updated every 100 steps, buffer 0.081 nm, rlist 1.081 nm
  inner list: updated every  26 steps, buffer 0.001 nm, rlist 1.001 nm
At tolerance 0.005 kJ/mol/ps per atom, equivalent classical 1x1 list would be:
  outer list: updated every 100 steps, buffer 0.178 nm, rlist 1.178 nm
  inner list: updated every  26 steps, buffer 0.030 nm, rlist 1.030 nm

Using Lorentz-Berthelot Lennard-Jones combination rule

Pinning threads with an auto-selected logical cpu stride of 2

Initializing LINear Constraint Solver

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and H. Bekker and H. J. C. Berendsen and J. G. E. M. Fraaije
LINCS: A Linear Constraint Solver for molecular simulations
J. Comp. Chem. 18 (1997) pp. 1463-1472
-------- -------- --- Thank You --- -------- --------

The number of constraints is 2500

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
G. Bussi, D. Donadio and M. Parrinello
Canonical sampling through velocity rescaling
J. Chem. Phys. 126 (2007) pp. 014101
-------- -------- --- Thank You --- -------- --------

There are: 3000 Atoms

Updating coordinates and applying constraints on the GPU.
Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
  0:  rest

Started mdrun on rank 0 Sun Jun 18 17:38:22 2023

           Step           Time
              0        0.00000

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
    5.00875e+03    7.45169e+02    2.44134e+03    7.85915e+03    1.17852e+02
        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.
   -1.09305e+04   -1.54474e+01    1.39640e+01    5.24030e+03    8.25183e+03
   Total Energy  Conserved En.    Temperature Pressure (bar)   Constr. rmsd
    1.34921e+04    1.34921e+04    3.05515e+02   -1.06356e+02    0.00000e+00

           Step           Time
           1000        2.00000

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
    4.90301e+03    8.16024e+02    2.39591e+03    7.97514e+03    1.17940e+02
        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.
   -1.10068e+04   -1.49212e+01    1.47116e+01    5.20100e+03    8.20113e+03
   Total Energy  Conserved En.    Temperature Pressure (bar)   Constr. rmsd
    1.34021e+04    1.33106e+04    3.03638e+02    2.13894e+02    0.00000e+00

           Step           Time
           2000        4.00000

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
    5.00717e+03    7.50216e+02    2.40450e+03    8.16778e+03    1.18080e+02
        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.
   -1.10345e+04   -1.50626e+01    1.47815e+01    5.41295e+03    7.93192e+03
   Total Energy  Conserved En.    Temperature Pressure (bar)   Constr. rmsd
    1.33449e+04    1.33061e+04    2.93671e+02    3.02563e+02    0.00000e+00

           Step           Time
           3000        6.00000

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
    4.99162e+03    8.65363e+02    2.36822e+03    7.97935e+03    1.17988e+02
        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.
   -1.10006e+04   -1.47685e+01    1.48372e+01    5.32205e+03    7.81447e+03
   Total Energy  Conserved En.    Temperature Pressure (bar)   Constr. rmsd
    1.31365e+04    1.32971e+04    2.89323e+02    2.21780e+02    0.00000e+00

           Step           Time
           4000        8.00000

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
    5.05691e+03    8.13637e+02    2.40639e+03    7.94752e+03    1.17967e+02
        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.
   -1.09705e+04   -1.50602e+01    1.45049e+01    5.37135e+03    7.90642e+03
   Total Energy  Conserved En.    Temperature Pressure (bar)   Constr. rmsd
    1.32778e+04    1.32896e+04    2.92727e+02    3.93686e+02    0.00000e+00

step 4200: timed with pme grid 32 32 32, coulomb cutoff 1.000: 411.0 M-cycles
step 4400: timed with pme grid 25 25 25, coulomb cutoff 1.192: 414.4 M-cycles
step 4600: timed with pme grid 20 20 20, coulomb cutoff 1.490: 445.5 M-cycles
step 4600: the maximum allowed grid scaling limits the PME load balancing to a coulomb cut-off of 1.490
step 4800: timed with pme grid 20 20 20, coulomb cutoff 1.490: 342.8 M-cycles
step 5000: timed with pme grid 24 24 24, coulomb cutoff 1.242: 340.9 M-cycles
           Step           Time
           5000       10.00000

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
    4.94633e+03    7.83678e+02    2.49225e+03    7.92461e+03    1.17936e+02
        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.
   -1.11438e+04   -6.69381e+00    6.32316e+00    5.12059e+03    7.98037e+03
   Total Energy  Conserved En.    Temperature Pressure (bar)   Constr. rmsd
    1.31010e+04    1.32880e+04    2.95465e+02   -1.27191e+02    0.00000e+00

step 5200: timed with pme grid 25 25 25, coulomb cutoff 1.192: 413.0 M-cycles
step 5400: timed with pme grid 28 28 28, coulomb cutoff 1.065: 425.4 M-cycles
step 5600: timed with pme grid 32 32 32, coulomb cutoff 1.000: 437.5 M-cycles
step 5800: timed with pme grid 20 20 20, coulomb cutoff 1.490: 435.6 M-cycles
step 6000: timed with pme grid 24 24 24, coulomb cutoff 1.242: 433.7 M-cycles
              optimal pme grid 24 24 24, coulomb cutoff 1.242
           Step           Time
           6000       12.00000

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
    4.87547e+03    6.80912e+02    2.52662e+03    8.01510e+03    1.18014e+02
        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.
   -1.10553e+04   -5.23801e+00    5.10132e+00    5.16066e+03    8.10695e+03
   Total Energy  Conserved En.    Temperature Pressure (bar)   Constr. rmsd
    1.32676e+04    1.32879e+04    3.00151e+02   -7.85406e+00    0.00000e+00

           Step           Time
           7000       14.00000

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
    4.91060e+03    7.46102e+02    2.45581e+03    7.99357e+03    1.18013e+02
        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.
   -1.10524e+04   -5.45146e+00    5.00611e+00    5.17120e+03    8.06282e+03
   Total Energy  Conserved En.    Temperature Pressure (bar)   Constr. rmsd
    1.32340e+04    1.32904e+04    2.98518e+02    1.87003e+02    0.00000e+00

           Step           Time
           8000       16.00000

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
    5.10349e+03    7.63422e+02    2.36288e+03    8.03221e+03    1.18007e+02
        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.
   -1.10090e+04   -5.49045e+00    5.04623e+00    5.37059e+03    7.99228e+03
   Total Energy  Conserved En.    Temperature Pressure (bar)   Constr. rmsd
    1.33629e+04    1.32919e+04    2.95906e+02    1.87096e+02    0.00000e+00

           Step           Time
           9000       18.00000

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
    5.03004e+03    6.79532e+02    2.45726e+03    7.94426e+03    1.17989e+02
        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.
   -1.09902e+04   -5.20891e+00    5.17254e+00    5.23883e+03    8.11902e+03
   Total Energy  Conserved En.    Temperature Pressure (bar)   Constr. rmsd
    1.33579e+04    1.32925e+04    3.00598e+02    1.18884e+02    0.00000e+00

Running with HIPSYCL_RT_MAX_CACHED_NODES=0 HIPSYCL_DEBUG_LEVEL=4 log:

https://we.tl/t-dsBiZ5CJcd

I did multiple tests and sometimes GROMACS works fine but there is no reason for this.

I hope you have more ideas for this, thanks.

Thanks!

Whatever goes wrong with hipsycl-info seems to be unrelated to GROMACS. Still, might be useful to figure out. Could you run ldd /usr/local/lib/hipSYCL/librt-backend-hip.so, please?

Back to GROMACS.

To clarify: you run the same simulation multiple times, and sometimes it works fine, sometimes it gets stuck somewhere mid-run? When the GROMACS gets stuck, did you notice any other issues (e.g., messages in dmesg, or problems with other applications)?

Also, could you please double-check that the log you attached (https://we.tl/t-dsBiZ5CJcd) is for the run with HIPSYCL_RT_MAX_CACHED_NODES=0?

Thanks again,

ldd /usr/local/lib/hipSYCL/librt-backend-hip.so:

linux-vdso.so.1 (0x00007ffc02bd3000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f5b1da00000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f5b1dd2f000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f5b1dd0f000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f5b1d600000)
/lib64/ld-linux-x86-64.so.2 (0x00007f5b1de56000)

To clarify: you run the same simulation multiple times, and sometimes it works fine, sometimes it gets stuck somewhere mid-run? When the GROMACS gets stuck, did you notice any other issues (e.g., messages in dmesg, or problems with other applications)?

You are correct the same simulation gets stuck and i don’t have any error or something in dmesg or journalctl.

I have been running GROMACS for more than a day and using the variables together on two computers with the same settings (HIPSYCL_RT_MAX_CACHED_NODES=0 HIPSYCL_DEBUG_LEVEL=4) everything works fine so I have not been able to get log file when the run was get stuck. The strange thing is that previously, even using these variables together, it would get stuck.

Do you think it’s because ROCm doesn’t support my GPU?

Thanks again.

1 Like

The log file you attached earlier makes it look like the HIPSYCL_RT_MAX_CACHED_NODES was not set. Perhaps you had a typo, or something else went wrong the first time. So, HIPSYCL_RT_MAX_CACHED_NODES=0 might be the workaround for the freezes.

Now, 0 is perhaps not the best value for HIPSYCL_RT_MAX_CACHED_NODES. For better performance you might want to try 2 or 5. Although, if my suspicions are correct, higher values might increase the likelihood of freezing. You have to experiment a bit; or stick with 0 if the resulting performance is okay for your tasks.

Some technical background, if you’re interested. hipSYCL is a layer between GROMACS and ROCm (or other GPU framework). When GROMACS wants to launch a task on the GPU, hipSYCL might cache it, and later launch several tasks in a single burst (this is called “flushing” in the logs) (also, this is not the reason we use hipSYCL; just a peculiarity of how it works internally). We have not encountered any deadlocks in our testing, but we have seen earlier ROCm versions struggle with such bursts of activity on our hardware. HIPSYCL_RT_MAX_CACHED_NODES affects how large such bursts can be (the default value is 100; 0 means no caching). Also, this caching is bad when dealing with small systems (like 3k atoms you have), so using smaller cache size would likely be better for performance (in current hipSYCL versions; will be fixed in future releases), even if we discard the deadlock issue.

ROCm tends to work okay on unsupported hardware. Older generation of consumer GPUs (e.g., RX6000-series) have corresponding professional versions supported by ROCm, and work fine with minimal limitations despite not being supported themselves. But RX7000-series (RDNA3) is quite new and there are no supported devices of the same architecture/generation.

Than you very much.

I’m going to do tests and we’ll see.

Another thing, do you know why GROMACS does not detect the GPU with a normal user only as root?

That is not normal, and, overall, running GROMACS as root is discouraged for security reasons (as is running most things as root).

I would suspect your user account does not have required permissions to access the device. Overall, this question is perhaps more suitable for the ROCm support channels, but they tend to be not very responsive (e.g., OpenCL driver requires root privilege on AWS ROCm installation · Issue #1411 · RadeonOpenCompute/ROCm · GitHub).

I would suggest to start by checking permissions on /dev/kfd and /dev/dri/render*. E.g., on my machine:

$ ls -lh /dev/kfd /dev/dri/render*
crw-rw----+ 1 root render 226, 128 May 22 12:25 /dev/dri/renderD128
crw-rw----+ 1 root render 226, 129 May 22 12:25 /dev/dri/renderD129
crw-rw----+ 1 root render 226, 130 May 22 12:25 /dev/dri/renderD130
crw-rw----+ 1 root render 226, 131 May 22 12:25 /dev/dri/renderD131
crw-rw----  1 root render 509,   0 May 22 12:25 /dev/kfd

This means that the user needs to be in the render group. You can check whether that’s the case by looking at the output of the groups command. If you don’t see render there (or whatever might be the case on your machine), you will need to run sudo usermod -a -G render my_user_name (replacing render and my_user_name with appropriate values) and possibly re-login.

Thank you, I really appreciate your help.

Hi team,

I have a problem, I compile GROMACS with HIP native compiler:

cmake … -DCMAKE_HIP_ARCHITECTURES=gfx1100 -DGMX_BUILD_OWN_FFTW=ON -DCMAKE_C_COMPILER=/opt/rocm-6.1.3/llvm/bin/amdclang -DCMAKE_CXX_COMPILER=/opt/rocm-6.1.3/lib/llvm/bin/amdclang++ -DREGRESSIONTEST_DOWNLOAD=ON -DGMX_GPU=HIP

gmx --version:

GROMACS: gmx, version 2023-dev
Executable: /usr/local/gromacs/bin/gmx
Data prefix: /usr/local/gromacs
Working dir: /tmp/ADH/adh_cubic
Command line:
gmx --version

GROMACS version: 2023-dev
Precision: mixed
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support: HIP
SIMD instructions: AVX2_256
CPU FFT library: fftw-3.3.8-sse2-avx-avx2-avx2_128
GPU FFT library: hipFFT
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
C compiler: /opt/rocm-6.1.3/llvm/bin/amdclang Clang 17.0.0
C compiler flags: -mavx2 -mfma -Wall -Wno-unused -Wunused-value -Wunused-parameter -Wno-missing-field-initializers -O3 -DNDEBUG
C++ compiler: /opt/rocm-6.1.3/lib/llvm/bin/amdclang++ Clang 17.0.0
C++ compiler flags: -mavx2 -mfma -Wall -Wextra -Wpointer-arith -Wmissing-prototypes -Wdeprecated -Wno-unused-function -Wno-reserved-identifier -Wno-missing-field-initializers -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-source-uses-openmp -Wno-c++17-extensions -Wno-documentation-unknown-command -Wno-covered-switch-default -Wno-switch-enum -Wno-extra-semi-stmt -Wno-weak-vtables -Wno-shadow -Wno-padded -Wno-reserved-id-macro -Wno-double-promotion -Wno-exit-time-destructors -Wno-global-constructors -Wno-documentation -Wno-format-nonliteral -Wno-used-but-marked-unused -Wno-float-equal -Wno-conditional-uninitialized -Wno-conversion -Wno-disabled-macro-expansion -Wno-unused-macros -fopenmp=libomp -O3 -DNDEBUG
HIP compiler: /opt/rocm/bin/hipcc 6.1.40093-bd86f1708
HIP compiler flags:-std=c++17 -mavx2 -mfma -fopenmp -O3 -DNDEBUG
HIP driver: 60140.93
HIP runtime: 60140.93

But I can’t run successfully:

GROMACS: gmx mdrun, version 2023-dev
Executable: /usr/local/gromacs/bin/gmx
Data prefix: /usr/local/gromacs
Working dir: /tmp/ADH/adh_cubic
Command line:
gmx mdrun -v -nsteps 100000 -resetstep 90000 -noconfout -ntmpi 1 -ntomp 1 -nb gpu -bonded gpu -pme gpu -npme 0 -nstlist 200 -s topol.tpr

WARNING: An error occurred while sanity checking device #0. An unhandled error from a previous HIP operation was detected. HIP error #209 (hipErrorNoBinaryForGpu): no kernel image is available for execution on the device.

Back Off! I just backed up md.log to ./#md.log.13#
Reading file topol.tpr, VERSION 2023-dev (single precision)
Overriding nsteps with value passed on the command line: 100000 steps, 200 ps
Changing nstlist from 10 to 200, rlist from 0.9 to 1.268


Program: gmx mdrun, version 2023-dev
Source file: src/gromacs/taskassignment/findallgputasks.cpp (line 86)

Fatal error:
Cannot run short-ranged nonbonded interactions on a GPU because no GPU is
detected.

Verify AMD GPU:

amd-smi list

GPU: 0
BDF: 0000:03:00.0
UUID: 87ff744c-0000-1000-80e9-2fb3c821563c

amd-smi topology

ACCESS TABLE:
0000:03:00.0
0000:03:00.0 ENABLED

WEIGHT TABLE:
0000:03:00.0
0000:03:00.0 0

HOPS TABLE:
0000:03:00.0
0000:03:00.0 0

LINK TYPE TABLE:
0000:03:00.0
0000:03:00.0 SELF

NUMA BW TABLE:
0000:03:00.0
0000:03:00.0 N/A

Hi!

Official GROMACS releases (so far) do not support building with HIP directly. Looks like you’re using the ROCm fork; in this case, please ask the question on their page: Issues · ROCm/Gromacs · GitHub. However, as far as I know, it does not support RDNA (consumer) devices at all.

EDIT: correction, the ROCm fork supports gfx1030 (i.e., RX6800/6900-series), but no other RDNA-family GPU.

Thank you so much, I really appreciate your help.