How to get high GUP utility (GPU-util) when runing two simulations task on one computer?

GROMACS version: 2019.2
GROMACS modification: No
Here post your question

Dear all,

I am trying to run two MD simulation tasks on one computer equipped with2CPUs and 4 GPUs. The CPU information is as following:

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 56
On-line CPU(s) list: 0-55
Thread(s) per core: 2
Core(s) per socket: 14
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel® Xeon® CPU E5-2690 v4 @ 2.60GHz
Stepping: 1
CPU MHz: 3199.929
BogoMIPS: 5206.06
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 35840K
NUMA node0 CPU(s): 0-13,28-41
NUMA node1 CPU(s): 14-27,42-55

I first submit one MD task, using two GPUs, with the follow command:
gmx mdrun -v -deffnm md -nt 24 -gpu_id 0,1
The GPU utility (GPU-Util) of the two GPUs engaged reaches 60% and 65%, respectively, like the following:

$nvisia-smi
±----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26 Driver Version: 375.26 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Graphics Device Off | 0000:02:00.0 Off | N/A |
| 53% 82C P2 151W / 250W | 197MiB / 12189MiB | 65% Default |
±------------------------------±---------------------±---------------------+
| 1 Graphics Device Off | 0000:03:00.0 Off | N/A |
| 51% 80C P2 148W / 250W | 193MiB / 12189MiB | 60% Default |
±------------------------------±---------------------±---------------------+
| 2 Graphics Device Off | 0000:82:00.0 Off | N/A |
| 23% 25C P8 8W / 250W | 2MiB / 12189MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 3 Graphics Device Off | 0000:83:00.0 Off | N/A |
| 23% 26C P8 9W / 250W | 2MiB / 12189MiB | 0% Default |
±------------------------------±---------------------±---------------------+

However, when I submit another MD task to the other two GPU, with the following command:

gmx mdrun -v -deffnm md -nt 24 -gpu_id 0,1

The GPU-util of the two GPUs running the first MD task drops to 16% and 20%, respectively. And the two GPUs running the second MD task are only 10% and 14, respectively, like the following:

±----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48 Driver Version: 367.48 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 0000:02:00.0 Off | N/A |
| 38% 56C P2 48W / 180W | 157MiB / 8113MiB | 16% Default |
±------------------------------±---------------------±---------------------+
| 1 GeForce GTX 1080 Off | 0000:03:00.0 Off | N/A |
| 41% 61C P2 50W / 180W | 155MiB / 8113MiB | 20% Default |
±------------------------------±---------------------±---------------------+
| 2 GeForce GTX 1080 Off | 0000:82:00.0 Off | N/A |
| 40% 60C P2 52W / 180W | 149MiB / 8113MiB | 10% Default |
±------------------------------±---------------------±---------------------+
| 3 GeForce GTX 1080 Off | 0000:83:00.0 Off | N/A |
| 38% 57C P2 49W / 180W | 149MiB / 8113MiB | 14% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 35470 C gmx 155MiB |
| 1 35470 C gmx 153MiB |
| 2 35729 C gmx 147MiB |
| 3 35729 C gmx 147MiB |
±----------------------------------------------------------------------------+

I wonder whether it is possible to keep all the GPU-util high when running two MD task. How should I change and adjust parameters for the mdrun commands to achieve this purpose?

Best regards

Hi - The commands you’re using both use gpu_id 0,1 - one of them should be 2,3 or you’re targeting both simulations to the same GPUs

Saying that you have 2 CPUs is misleading - you appear to have 2 sockets, each with 14 physical CPUs and 28 logical CPUs.

You will have maximum throughput limiting to 1 GPU per simulation. Also, you can have a big performance hit from not thread pinning. Something like this might get you decent performance

gmx mdrun -nt 14 -pin on -pinoffset 0 -gpu_id 0 &
gmx mdrun -nt 14 -pin on -pinoffset 14 -gpu_id 1 &
gmx mdrun -nt 14 -pin on -pinoffset 28 -gpu_id 2 &
gmx mdrun -nt 14 -pin on -pinoffset 42 -gpu_id 3

Note that the first sim is now pinned to logical cores 0-13, the second to logical cores 14-27, and so on. If you can’t use the whole computer, feel free to reduce those thread counts, but try not to have a simulation that spans between cores 27-28, since that’s probably the socket boundary unless your computer counts CPUs weird.

There is also a whole lot you can do to try to optimize per-simulation performance. Have you checked out the performance guide? It has lots of good examples.

Another thing - to really maximize GPU utilization, if you have enough CPUs (which you appear to), you can run 2 simulations per GPU.