Domain decomposition error with virtual sites

GROMACS version: 2020.4
GROMACS modification: No

Hello,

Using virtual sites for hydrogen atoms I am able to use 5 fs time step if using GPU (single MPI rank). However, I get domain decomposition error when trying to run the simulation on HPC node with 128 cores:

“Fatal error:
There is no domain decomposition for 96 ranks that is compatible with the
given box and a minimum cell size of 1.49875 nm
Change the number of ranks or mdrun option -rcon or -dds or your LINCS
settings
Look in the log file for details on the domain decomposition”

The simulation works on the CPU node with 4 fs time step and lincs-order = 4. With lincs-order = 6 I get the same domain decomposition error.

Without virtual sites and using 2 fs time step the simulation runs fluently on multiple CPU nodes without any issues at all.

Would it be possible to use the 5 fs time step when running on the CPU nodes?

Thank you!

Hi,

If you compare the logs you’ll see in the domain decomposition statistics that the virtual sites run is required to use a larger minimum domain size which will limit the number of domains. You can use more threads per rank and that way reduce the total number of ranks.

That said, (I assume) you want to improve absolute performance, not just running on more resources, and doing so might be possible by other means:

  • check that you can acually scale to 128 cores (e.g. check 64, 96, etc.);
  • use more OpenMP threads per rank;
  • make sure that DD load imbalance and PP/PME balance are not limiting your performance (if they are there are ways to reduce those);
  • consider using MTS.

Cheers
Szilárd