Reducing "Wait GPU state copy" for single GPU runs

pszilard · January 27, 2022, 8:24pm

Nothing abnormal here, as you offload all CPU force compute the CPU has no useful work to do and after enqueuing GPU work for a sequence of steps until the CPU requires results (e.g. for pair search of I/O) it waits for the GPU to complete work. That is the wall-time measured in the above counter.

That is unfortunately a limitation of the current GPU-resident parallelization. The only thing you could do is to consider switching to a supported thermostat.

Based on you log there is not a lot of performance left on the table, but you could try a few tweaks:

increase nstlist to reduce the search time
move the bonded interactions back to the CPU (-bonded cpu option), the 16 CPU cores may be fast enough to give a slight benefit
if you care about throughput, run two simulations on the same GPU

Topic		Replies	Views
Large % of "rest" time when running on GPU User discussions mdrun-performance	0	804	December 23, 2021
Gromacs performance on GPU User discussions	1	1438	March 23, 2022
Optimizing GPU performance for GROMACS? User discussions	6	1565	January 13, 2021
Performance Decline User discussions	15	748	September 4, 2020
MD performance help using V_100 GPU User discussions mdrun , mdrun-performance	7	1046	June 21, 2023

Reducing "Wait GPU state copy" for single GPU runs

Related topics