GROMACS version: 23.2/24.2
GROMACS modification: No
Hello,
I’m setting up GROMACS on a HPC system and have encountered an issues where GROMACS only detects one GPU on a node that has four GPUs available. Below is an excerpt from a log file:
Hardware detected on host (the node of MPI rank 0):
CPU info:
Vendor: AMD
Brand: AMD EPYC 7742 64-Core Processor
Family: 23 Model: 49 Stepping: 0
Features: aes amd apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt lahf misalignsse mmx msr nonstop_tsc pclmuldq pdpe1gb popcnt pse rdrnd rdtscp sha sse2 sse3 sse4a sse4.1 sse4.2 ssse3 x2apic
Hardware topology: Full, with devices
Packages, cores, and logical processors:
[indices refer to OS logical processors]
Package 0: [ 0 128] [ 1 129] [ 2 130] [ 3 131] [ 4 132] [ 5 133] [ 6 134] [ 7 135] [ 8 136] [ 9 137] [ 10 138] [ 11 139] [ 12 140] [ 13 141] [ 14 142] [ 15 143] [ 16 144] [ 17 145] [ 18 146] [ 19 147] [ 20 148] [ 21 149] [ 22 150] [ 23 151] [ 24 152] [ 25 153] [ 26 154] [ 27 155] [ 28 156] [ 29 157] [ 30 158] [ 31 159] [ 32 160] [ 33 161] [ 34 162] [ 35 163] [ 36 164] [ 37 165] [ 38 166] [ 39 167] [ 40 168] [ 41 169] [ 42 170] [ 43 171] [ 44 172] [ 45 173] [ 46 174] [ 47 175] [ 48 176] [ 49 177] [ 50 178] [ 51 179] [ 52 180] [ 53 181] [ 54 182] [ 55 183] [ 56 184] [ 57 185] [ 58 186] [ 59 187] [ 60 188] [ 61 189] [ 62 190] [ 63 191]
Package 1: [ 64 192] [ 65 193] [ 66 194] [ 67 195] [ 68 196] [ 69 197] [ 70 198] [ 71 199] [ 72 200] [ 73 201] [ 74 202] [ 75 203] [ 76 204] [ 77 205] [ 78 206] [ 79 207] [ 80 208] [ 81 209] [ 82 210] [ 83 211] [ 84 212] [ 85 213] [ 86 214] [ 87 215] [ 88 216] [ 89 217] [ 90 218] [ 91 219] [ 92 220] [ 93 221] [ 94 222] [ 95 223] [ 96 224] [ 97 225] [ 98 226] [ 99 227] [ 100 228] [ 101 229] [ 102 230] [ 103 231] [ 104 232] [ 105 233] [ 106 234] [ 107 235] [ 108 236] [ 109 237] [ 110 238] [ 111 239] [ 112 240] [ 113 241] [ 114 242] [ 115 243] [ 116 244] [ 117 245] [ 118 246] [ 119 247] [ 120 248] [ 121 249] [ 122 250] [ 123 251] [ 124 252] [ 125 253] [ 126 254] [ 127 255]
CPU limit set by OS: -1 Recommended max number of threads: 256
Numa nodes:
Node 0 (66902454272 bytes mem): 0 128 1 129 2 130 3 131 4 132 5 133 6 134 7 135 8 136 9 137 10 138 11 139 12 140 13 141 14 142 15 143
Node 1 (67636203520 bytes mem): 16 144 17 145 18 146 19 147 20 148 21 149 22 150 23 151 24 152 25 153 26 154 27 155 28 156 29 157 30 158 31 159
Node 2 (67636203520 bytes mem): 32 160 33 161 34 162 35 163 36 164 37 165 38 166 39 167 40 168 41 169 42 170 43 171 44 172 45 173 46 174 47 175
Node 3 (67623620608 bytes mem): 48 176 49 177 50 178 51 179 52 180 53 181 54 182 55 183 56 184 57 185 58 186 59 187 60 188 61 189 62 190 63 191
Node 4 (67636203520 bytes mem): 64 192 65 193 66 194 67 195 68 196 69 197 70 198 71 199 72 200 73 201 74 202 75 203 76 204 77 205 78 206 79 207
Node 5 (67590402048 bytes mem): 80 208 81 209 82 210 83 211 84 212 85 213 86 214 87 215 88 216 89 217 90 218 91 219 92 220 93 221 94 222 95 223
Node 6 (67636203520 bytes mem): 96 224 97 225 98 226 99 227 100 228 101 229 102 230 103 231 104 232 105 233 106 234 107 235 108 236 109 237 110 238 111 239
Node 7 (67623944192 bytes mem): 112 240 113 241 114 242 115 243 116 244 117 245 118 246 119 247 120 248 121 249 122 250 123 251 124 252 125 253 126 254 127 255
Latency:
0 1 2 3 4 5 6 7
0 1.00 1.20 1.20 1.20 3.20 3.20 3.20 3.20
1 1.20 1.00 1.20 1.20 3.20 3.20 3.20 3.20
2 1.20 1.20 1.00 1.20 3.20 3.20 3.20 3.20
3 1.20 1.20 1.20 1.00 3.20 3.20 3.20 3.20
4 3.20 3.20 3.20 3.20 1.00 1.20 1.20 1.20
5 3.20 3.20 3.20 3.20 1.20 1.00 1.20 1.20
6 3.20 3.20 3.20 3.20 1.20 1.20 1.00 1.20
7 3.20 3.20 3.20 3.20 1.20 1.20 1.20 1.00
Caches:
L1: 32768 bytes, linesize 64 bytes, assoc. 8, shared 2 ways
L2: 524288 bytes, linesize 64 bytes, assoc. 8, shared 2 ways
L3: 16777216 bytes, linesize 64 bytes, assoc. 16, shared 8 ways
PCI devices:
0000:62:00.0 Id: 1a03:2000 Class: 0x0300 Numa: 0
0000:43:00.0 Id: 15b3:101b Class: 0x0207 Numa: 1
0000:44:00.0 Id: 10de:20b0 Class: 0x0302 Numa: 1
0000:45:00.0 Id: 1000:00b2 Class: 0x0107 Numa: 1
0000:03:00.0 Id: 10de:20b0 Class: 0x0302 Numa: 3
0000:05:00.0 Id: 1000:00b2 Class: 0x0107 Numa: 3
0000:e1:00.0 Id: 8086:1523 Class: 0x0200 Numa: 4
0000:e1:00.1 Id: 8086:1523 Class: 0x0200 Numa: 4
0000:c4:00.0 Id: 10de:20b0 Class: 0x0302 Numa: 5
0000:c5:00.0 Id: 1000:00b2 Class: 0x0107 Numa: 5
0000:c8:00.0 Id: 1022:7901 Class: 0x0106 Numa: 5
0000:83:00.0 Id: 15b3:101b Class: 0x0207 Numa: 7
0000:84:00.0 Id: 10de:20b0 Class: 0x0302 Numa: 7
0000:85:00.0 Id: 1000:00b2 Class: 0x0107 Numa: 7
GPU info:
Number of GPUs detected: 1
#0: NVIDIA NVIDIA A100-SXM4-40GB, compute cap.: 8.0, ECC: yes, stat: compatible
As you can see GROMACS only detects on GPU even though there are 4 gpus available (PCI devices 10de:20b0).
These GPUs also show up with nvidia-smi -L
GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-d0e4a7a4-d046-9f66-460e-81d527826f93)
GPU 1: NVIDIA A100-SXM4-40GB (UUID: GPU-0ed2dcdd-44de-9d8c-e180-cbc33e4a21df)
GPU 2: NVIDIA A100-SXM4-40GB (UUID: GPU-cb37e533-f45e-f022-c991-052d1990e13f)
GPU 3: NVIDIA A100-SXM4-40GB (UUID: GPU-35f9efa7-d3ea-9325-4c54-34cb5586ea1b)
and echo $CUDA_VISIBLE_DEVICES shows:
0,1,2,3
The hpc system uses Slurm and i have tried various combinations of resource requests, to no avail. The output above was obtained by requesting the node in --exclusive mode.
Similarly using different combinations of compilers and MPI libraries did not make a difference. I tested:
GCC/12.3.0 + ParaStationMPI/5.9.2-1 + CUDA/12
GCC/12.3.0 + OpenMPI/4.1.5 + CUDA/12
GCC/12.3.0 + OpenMPI/4.1.5 + CUDA/12 + hwloc/2.9.1
Intel/2023.2.1 + IntelMPI/2021.10.0 + CUDA/12
If anyone has encountered a comparable issue any help is highly appreciated.
Best Regards,
Florian