GROMACS version: 2021
GROMACS modification: No
When I run my mdrun, I see these messages:
step 6200: timed with pme grid 60 80 60, coulomb cutoff 1.037: 407.2 M-cycles
step 6400: timed with pme grid 64 80 60, coulomb cutoff 1.000: 397.3 M-cycles
optimal pme grid 64 80 60, coulomb cutoff 1.000
step 19900, remaining wall clock time: 0 s
Writing final coordinates.
As you can see for 20k steps, the optimal pme grid is calculated at step 6400. If I change the number of steps, that step change too. For example, if the total number of steps is 5000, I see:
step 4700: timed with pme grid 64 80 60, coulomb cutoff 1.000: 394.6 M-cycles
step 4900: timed with pme grid 60 72 56, coulomb cutoff 1.075: 418.5 M-cycles
Writing final coordinates.
Here, no optimal pme is calculated.
May I know what is that?
Your second run is too short and never finished the PME tuning load balancer the output of which you are observing. Make sure to run long enough for the balancer to find the optimal setting and to also run a significant time with that setting (also use the
-resetstep/-resethway if you want to collect reliable performance data from shorter runs).
Is it possible to disable the load balancer?
Make sure to run long enough for the balancer to find the optimal setting
What do you mean by optimal setting? Do you mean number of threads or so? Such things are determined at the beginning of the execution. My command is
gmx mdrun -nb gpu -v -deffnm nvt
it’s trying to optimize the PME computational parameters by changing the ones noted (pme grid and cutoff). The number of threads are not changed.
In your original post, it found and reported the optimal settings after 6400 steps for the first simulation. The second simulation only ran for 5000 steps and so it didn’t have enough time to find the optimal settings, which is why they did not get reported.
You may be able to disable the tuning with
-notunepme but you probably want it turned on unless you are absolutely sure of what you are doing.
Ok. Is that tuning related to the performace of simulation in my run? Or it affect the accuracy of computation? For my work, I am analyzing the performace, so the actuall output an number steps for molecular precision is not my interest.
It is tuning to maximize the performance, yes. The accuracy is not changed (or kept within a safe tolerance).
I tested with and without
-notunepme and it seems that using that options has lower run time. Please see
I tested multiple times and the run times are around 22 and 26 seconds, respectively. So, it seems that not using that option is better. I guess that is not expected. Any thoughts?
The tuning takes some time to finish, during which the performance is lower. To properly compare the performance you should only benchmark the optimized performance.
As @pszilard pointed out you can use
-resethway/-resetstep to only use the latter, optimized part for the performance reporting.