Accelerating PME on the CPU

GROMACS version: 2019.3
GROMACS modification: No

When running PME on the CPU -resethway -noconfout -nsteps 10000 -nb gpu -pme cpu I get the following error:

Program:     gmx mdrun, version 2019.3
Source file: src/gromacs/mdlib/resethandler.cpp (line 163)
MPI rank:    88 (out of 128)

Fatal error:
PME tuning was still active when attempting to reset mdrun counters at step
5000. Try resetting counters later in the run, e.g. with gmx mdrun -resetstep.

Since the problem seems to be PME, is there a way to accelerate this phase so that -resethway is still possible? In this case PME must be run on the CPU.



as far as I know the optimization cannot be sped up (speeding it up might also make it less reliable, which is not what you want if you are running a benchmark).

The better solution should be to increase -nsteps or set the reset step explicitly with -resetstep.


What about -notunepme ?

Also, the funny thing is that sometimes it runs correctly, but most is caught by that error…

-notunepme turns off the PME optimization altogether. Why are you running this simulation?

The optimization runs until it believes it has found an optimal configuration. This sometimes takes more steps since the process may be interrupted by the OS running some other task or any number of other factors.

It’s just a benchmark, so we are trying to find the most optimum point.

If it’s a regular benchmark you should leave the PME tuning turned on. Increasing the number of simulation steps to 20000 should be sufficient. That leaves 10000 steps for the tuning to finish in.


Okay, I see that. If I do not specify -pme cpu, would the PME run on the GPU if there is one available?

If GROMACS detects a compatible GPU it will automatically offload the PME long-range part to the GPU if it runs with a single (thread-)MPI rank. If you start mdrun with more than one rank, PME will by default run on the CPU. The PME long-range part can not be spread across multiple GPUs at the moment, so for good performance on single-GPU nodes, using a single rank should give you good performance in most of the cases.