Increase Performance of the simulation

GROMACS version: 2020
GROMACS modification: No
Here post your question

Dear all,

As I am not a computer scientist, and frankly I have a minimal understanding on how load balancing works. After my simulation, I noticed a lot of imbalance during the simulation and have no idea how to improve… hence can someone explain to me how to improve the speed of my simulation and what is going on with the report? Thanks. Here is the report:

Dynamic load balancing report:
DLB was off during the run due to low measured imbalance.
Average load imbalance: 22.6%.
The balanceable part of the MD step is 72%, load imbalance is computed from this.
Part of the total run time spent waiting due to load imbalance: 16.3%.
Average PME mesh/force load: 1.061
Part of the total run time spent waiting due to PP/PME imbalance: 3.4 %

NOTE: 16.3 % of the available CPU time was lost due to load imbalance
in the domain decomposition.
You might want to use dynamic load balancing (option -dlb.)
You can also consider manually changing the decomposition (option -dd);
e.g. by using fewer domains along the box dimension in which there is
considerable inhomogeneity in the simulated system.

           Core t (s)   Wall t (s)        (%)
   Time:  5311020.742   132775.525     4000.0
                     1d12h52:55
             (ns/day)    (hour/ns)

Performance: 26.029 0.922

This is a simulation for ~100,000 atoms.

Best,

Ben

Hi,

improving performance can be tricky since there are a lot of parameters to adjust and sometimes you just cannot improve it that much.

Did you try using the -dlb option to mdrun that the report suggests? If not I would suggest setting up some short benchmark simulations for you system, to try which options give you the best performance. Simulating ~20,000 steps should be enough to give a good measure. If you are not very familiar with how, this is the general procedure:

  1. Use the same hardware and settings that your final simulation will run on
  2. Copy your .tpr file to a new directory for each test
  3. Run mdrun with the flags -nsteps 20000 -resethway -noconfout and whatever other options you want to use for the benchmark (i.e. -dlb or -nodlb in different directories to see the impact of it)
  4. Check the reported performance in md.log. (-resethway resets the performance counters halfway through the simulation, which is important since the simulation has to warm up until it reaches it’s final speed)

Kind regards,
Petter

1 Like

Hi Petter,

Thanks for the suggestion and I will try it out.

Best,

Ben

Hi,

Let me add some explanation to the somewhat incomplete report that may be confusing.
While the report states “low measured imbalance”, I the reason why DLB is not automatically turned on is that, when PME is the limiting component of the run (that is PME / PP load > 1), it is assumed that dynamic load balancing can not improve performance as this can only balance PP load while adding some overhead to PME (increased x redistribution cost). However, in some cases this may not always be true, and the gain from better load balance across the whole step (including constraints) may outweigh the drawbacks of DLB and you may observe improved performance with -dlb yes despite possibly deteriorating the PME/PP balance further.

The tests @pjohansson suggested should help you check.

Cheers,
Szilárd

2 Likes