Domain decomposition error + Setting MPI ranks compatible with custom domains

GROMACS version: GROMACS/2020-foss-2019b and GROMACS/2021-foss-2020b
GROMACS modification: Yes (HPC installations, info in linked log file)

Dear all,

I am simulating a protein-ligand complex in a rhombic dodecahedral box of ~409nm³, (box-X, box-Y, box-Z) ~= (8.33, 8.33, 5.89)nm. The system consists of 41672 atoms.

From the documentation, I knew I could use up to 16 processors on a single node without domain decomposition (DD), resulting in a performance of ~30ns/day.

When I use more than 16 cores, I get the fatal DD error (here, for 18 processors):
“1196 of the 49025 bonded interactions could not be calculated because some atoms involved moved further apart than the multi-body cut-off distance (0.835804 nm) or the two-body cut-off distance (1.59575 nm), see option -rdd, for pairs and tabulated bonds also see option -ddcheck”

However, the same log file reveals that the maximum distances for bonded interactions are quite smaller than the cut-off values:
“Initial maximum distances in bonded interactions:
two-body bonded interactions: 0.429 nm, LJ-14, atoms 1837 1845
multi-body bonded interactions: 0.488 nm, CMAP Dih., atoms 440 449
Minimum cell size due to bonded interactions: 0.537 nm”

My first question is whether the cut-off values are also for coulombic interactions, which act over a larger distance? If not, could someone tell me what is causing this error, or how I can ‘view’ the distances that are troublesome?
The error occurs for both GROMACS 2020 and 2021 (I normally work with 2020, but apparently this version did not output the error completely, while 2021 does). I add the log file for 2021, with 18 processors:
log file: step5_6.log - Google Drive
mdp file: mdout.mdp - Google Drive

I have tried setting -rdd to 1.4 and 1.6, but those result in another fatal error (initial cell size smaller than cell size limit). I tried setting custom domain cells with -dd (see below), to no avail. I have read the documentation (getting good performance) and the common errors. I find it hard to believe that 16 processors is the upper limit for my system of 40k+ atoms. (For example, I found a benchmark system of 20k atoms, where they used 8 nodes with 124 MPI ranks per node: DOT html [change the DOT])

Finally, since I use a custom force field for the ligand, I thought the problem may be rooted here. However, using CHARMM-GUI to get GROMACS run files for the protein only (using the PDB code) results in the same errors, albeit with less problematic bonds.

I thought I could still get a performance increase by using less domains, for which the system would (hopefully) be compatible with the cut-off distances. I tried 4 nodes with 16 or 18 processors, with a custom DD of -dd 2 2 1. The idea was to create 4 ranks, for 4 domains, each using 16 or 18 OpenMP (such that ntomp x nmpi = number of processors). I tested multiple settings of MPI -np and mdrun -ntomp, but GROMACS would always use more than 4 ranks, leading to the fatal DD error.

My second question is whether I can increase the performance using multiple nodes, without using too many ranks? I am no computer scientist, and I don’t really understand all the thread-related options. Could my simulation be sped up by ussing 4 domain cells and spreading the computational load across 4 nodes?

Of course, I hope that I can resolve the problem in my first question, which may turn the second question obsolete.

If you need any more information/input/…, I’d be happy to provide.