Question about PME WAIT FOR PP and WAIT GPU STATE COPY variables in performance log

GROMACS version: 2020.4
GROMACS modification: No


I’m running an 82K atom benchmark to configure my hardware for a molecular dynamics simulation. With the current configuration, I reach a performance of ~80 ns/day, but a large chunk of the performance is being lost to PME WAIT FOR PP and WAIT GPU STATE COPY (screenshot from logfile attached). I have four K80 GPUs in total. Three are designated for PP and one for PME. Additionally I am using four tMPI threads and 3 OpenMP threads per tMPI. I have tried adding more threads, but there is not a significant change in PME WAIT FOR PP nor WAIT GPU STATE COPY.

I’m wondering if anyone has any suggestions on how to optimize performance from this point? It would be very much appreciated!