Timer documentation

GROMACS version: 2024.2
GROMACS modification: No

I have been looking for an explanation of the individual timers in the TIME ACCOUNTING section of the .log file. I could not find anything in the Gromacs manual or the source code. Could you point me to a suitable resource, if availabe?

To be precise, it’s about timers like these:

On 4 MPI ranks

 Activity:              Num   Num      Call    Wall time         Giga-Cycles
                        Ranks Threads  Count      (s)         total sum    %
--------------------------------------------------------------------------------
 Domain decomp.            4    1        401      37.772        332.398   3.1
 DD comm. load             4    1        200       0.155          1.368   0.0
 DD comm. bounds           4    1        200       0.075          0.663   0.0
 Neighbor search           4    1        201      75.363        663.196   6.2
 Launch PP GPU ops.        4    1     199602       3.465         30.494   0.3
 Comm. coord.              4    1      49800      30.287        266.526   2.5
 Force                     4    1      50001     454.870       4002.860  37.3
 Wait + Comm. F            4    1      50001       9.711         85.457   0.8
 Wait GPU NB nonloc.       4    1      50001       0.030          0.265   0.0
 Wait GPU NB local         4    1      50001       0.019          0.163   0.0
 Wait GPU state copy       4    1      94800     100.559        884.922   8.3
 NB X/F buffer ops.        4    1      10002       5.769         50.765   0.5
 Write traj.               4    1          2       1.206         10.614   0.1
 Update                    4    1      50001      57.560        506.533   4.7
 Constraints               4    1      50001     375.329       3302.900  30.8
 Comm. energies            4    1       5001       7.165         63.050   0.6
 Rest                                             58.715        516.694   4.8
--------------------------------------------------------------------------------
 Total                                          1218.051      10718.867 100.0
--------------------------------------------------------------------------------

There is no documentation for these. The only thing you can do is look in the code around which regions these timers are placed.

What would you like to know?

Hi, thank you for the quick reply.

I’m investigating how the individual timers evolve with the number of OMP threads when GPU offloading is activated. There does not seem to be a timer that explicitly measures GPU execution time, but timers for the time that the CPU spends in waiting for the accelerator tasks to be finished. For example, I have this measurement:

The value on the y-axis is the given timer for N OMP threads, normalized to the timer value for one OMP thread. While all timers decrease the more OMP threads are employed the timer “Wait GPU state copy” is increases for more than four threads, making up about 50% of the entire runtime with 32 threads.
Inspecting the code I found out that this timer contains the time that the CPUs spent waiting for the GPUs to finish. My explanation for this behavior is that, since time that the GPUs take up is constant, you see the effect of the CPU tasks finishing up their actual work more quickly.

1 Like

I still don’t see a question and you don’t write what you want to achieve.

But your interpretations are correct. Note that the 2025(-beta) release has improved OpenMP parallelization of the domain decomposition.

GPU timings are complex. With CUDA it is nearly impossible to obtain realistic numbers and you will have to use a profiler.