Good morning all,
I am not able to push my GPU efficiency higher than 7% :(
I have tried a few configurations (with relative performance):
gmx mdrun -v #Performance: 34.852 0.689
gmx mdrun -v -nt 1 #Performance: 3.065 7.830
gmx mdrun -v -nt 4 #Performance: 10.182 2.357
gmx mdrun -v -ntmpi 1 -ntomp 1 #Performance: 3.068 7.822
gmx mdrun -v -ntmpi 1 -ntomp 4 #Performance: 10.272 2.336
gmx mdrun -v -ntmpi 4 -ntomp 1 #Performance: 6.352 3.778
gmx mdrun -v -ntmpi 1 -nb gpu #Performance: 33.951 0.707
gmx mdrun -v -ntmpi 1 -nb gpu -pme cpu #Performance: 18.239 1.316
gmx mdrun -v -ntmpi 1 -nb gpu -pme cpu -bonded gpu #Performance: 18.820 1.275
gmx mdrun -v -ntmpi 1 -nb gpu -pme gpu -bonded gpu -update gpu #Performance: 81.736 0.294
mpirun -np 1 gmx mdrun -v #Performance: 34.287 0.700
For the best option, nvidia-smi gives mostly no use of GPU:
±----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 108… On | 00000000:04:00.0 Off | N/A |
| 24% 45C P2 70W / 250W | 181MiB / 11178MiB | 7% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 1 GeForce GTX 108… On | 00000000:05:00.0 Off | N/A |
| 23% 26C P8 9W / 250W | 0MiB / 11178MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 2 GeForce GTX 108… On | 00000000:08:00.0 Off | N/A |
| 23% 24C P8 8W / 250W | 0MiB / 11178MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 3 GeForce GTX 108… On | 00000000:09:00.0 Off | N/A |
| 23% 28C P8 9W / 250W | 0MiB / 11178MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 4 GeForce GTX 108… On | 00000000:83:00.0 Off | N/A |
| 23% 26C P8 9W / 250W | 0MiB / 11178MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 5 GeForce GTX 108… On | 00000000:84:00.0 Off | N/A |
| 23% 27C P8 9W / 250W | 0MiB / 11178MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 6 GeForce GTX 108… On | 00000000:87:00.0 Off | N/A |
| 23% 24C P8 8W / 250W | 0MiB / 11178MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 7 GeForce GTX 108… On | 00000000:88:00.0 Off | N/A |
| 23% 27C P8 9W / 250W | 0MiB / 11178MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2467144 C gmx 179MiB |
±----------------------------------------------------------------------------+
FYI: I am studying an ionic system with 1000 molecules
— BOF —
integrator = md
nsteps = 1000
dt = 0.002
nstxout = 100
nstvout = 100
nstenergy = 100
nstlog = 100
continuation = no
constraint_algorithm = lincs
constraints = h-bonds
lincs_iter = 1
lincs_order = 4
cutoff-scheme = Verlet
nstlist = 10
rcoulomb = 1.0
rvdw = 1.0
DispCorr = EnerPres
coulombtype = PME
pme_order = 4
fourierspacing = 0.16
tcoupl = V-rescale
tc-grps = LI TFS DIG
tau_t = 1.0000 1.0000 1.0000
ref_t = 200 200 200
pcoupl = no
pbc = xyz
gen_vel = yes
gen_temp = 300
gen_seed = 1234
—EOF—
Please find below some information from the log file regarding the detected hardware.
Precision: mixed
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support: CUDA
SIMD instructions: AVX2_256
FFT library: fftw-3.3.8-sse2-avx-avx2-avx2_128
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
C compiler: /usr/bin/gcc GNU 8.2.0
C compiler flags: -mavx2 -mfma -Wno-missing-field-initializers -fno-inline -pthread -g
C++ compiler: /usr/bin/g++ GNU 8.2.0
C++ compiler flags: -mavx2 -mfma -Wno-missing-field-initializers -fno-inline -pthread -fopenmp -g
CUDA compiler: /cm/shared/apps/cuda11.1/toolkit/11.1.1/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2020 NVIDIA Corporation;Built on Mon_Oct_12_20:09:46_PDT_2020;Cuda compilation tools, release 11.1, V11.1.105;Build cuda_11.1.TC455_06.29190527_0
CUDA compiler flags:-std=c++17;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-Wno-deprecated-gpu-targets;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_35,code=compute_35;-gencode;arch=compute_53,code=compute_53;-gencode;arch=compute_80,code=compute_80;-use_fast_math;;-mavx2 -mfma -Wno-missing-field-initializers -fno-inline -pthread -fopenmp -g
CUDA driver: 11.0
CUDA runtime: 11.10
Running on 1 node with total 16 cores, 16 logical cores, 1 compatible GPU
Hardware detected:
CPU info:
Vendor: Intel
Brand: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz
Family: 6 Model: 63 Stepping: 2
Features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
Hardware topology: Basic
Sockets, cores, and logical processors:
Socket 0: [ 0] [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7]
Socket 1: [ 8] [ 9] [ 10] [ 11] [ 12] [ 13] [ 14] [ 15]
GPU info:
Number of GPUs detected: 1
#0: NVIDIA GeForce GTX 1080 Ti, compute cap.: 6.1, ECC: no, stat: compatible
Does anyone have an hint?
Thank
Marco