MD Performance dependency on PCIe bandwidth

hij09373 · February 16, 2023, 3:42pm

Hi Szilard,

thanks for your rapid reply. The problem with the GPU-resident mode is that I cannot use virtual sites or for my specific problem the pulling code and FEP.

-update

    Used to set where to execute update and constraints, when present. Can be set to “auto”, “cpu”, “gpu.” Defaults to “auto,” which currently always uses the CPU. Setting “gpu” requires that a compatible CUDA GPU is available, the simulation uses a single rank. 
Update and constraints on a GPU is currently not supported with mass and constraints free-energy perturbation, domain decomposition, virtual sites, Ewald surface correction, replica exchange, constraint pulling, orientation restraints and computational electrophysiology.

I suppose, the reduction of CPU-GPU data movement can be realized by using the GPU-resident mode. This is what I understood from the manual.

GROMACS supports two major offload modes: force-offload and GPU-resident. 
The former involves offloading some of or all interaction calculations with integration on the CPU (hence requiring per-step data movement). 
In the GPU-resident mode by offloading integration and constraints (when used) less data movement is necessary.

So in my case, where I used force-offload mode,
Did the CPU wait the non bonded and PME results from the GPU because:
a) It is “stucked” or “limited” by the connection’s bandwidth? or
b) It is rather due to the calculation of PME and nonbonded parts on GPU which takes time?

I attached now the new log file with a longer simulation time?

npt.log (671.7 KB)

If I shifted the PME calculation on the GPU, then the efficiency increased as one can see in the rest time. The overall performance however is slightly lower. I assume that this was due to the number of cores and CPU type.

npt_2.log (672.6 KB)

Of course in the scenario of multi GPU, I consider NVLink between two GPUs. Still I assume, when using force-offload mode, then probably the data transfer from CPU<->(GPU1 <-NVLINK->GPU2) can be only benefit from large bandwidth, unless GPU-resident mode is used. Is my assumption correct?

Many thanks,

John

Topic		Replies	Views
Wait GPU NB nonloc. % is too high User discussions	3	640	October 26, 2020
Loss of performance in v. 2021 User discussions mdrun	25	1479	January 28, 2022
Abysmal MD production performance on GPU node User discussions mdrun	8	946	December 15, 2023
Very Low GPU utilization User discussions	7	1807	December 2, 2020
How to offload more work on GPU than CPU? User discussions mdrun	2	818	September 16, 2022

MD Performance dependency on PCIe bandwidth

Related topics