Question about PME WAIT FOR PP and WAIT GPU STATE COPY variables in performance log

GROMACS version: 2020.4
GROMACS modification: No

Hello,

I’m running an 82K atom benchmark to configure my hardware for a molecular dynamics simulation. With the current configuration, I reach a performance of ~80 ns/day, but a large chunk of the performance is being lost to PME WAIT FOR PP and WAIT GPU STATE COPY (screenshot from logfile attached). I have four K80 GPUs in total. Three are designated for PP and one for PME. Additionally I am using four tMPI threads and 3 OpenMP threads per tMPI. I have tried adding more threads, but there is not a significant change in PME WAIT FOR PP nor WAIT GPU STATE COPY.

I’m wondering if anyone has any suggestions on how to optimize performance from this point? It would be very much appreciated!

Best,
George

I would try to increase tMPI to 8

The original post is more than one year old, so I don’t know if answering is still useful.

The issue is that the constraints take a lot of time and constraining happens on the CPU and the the GPUs are waiting. I assume that -update gpu does not work as there are likely coupled constraints. With constraints on h-bonds only integration and constraining can be done on the CPU and the performance would be much better.