Maybe my previous question was too long, I’ll keep it short.
When attempting domain decomposition for my simulation, the log file shows: "Initial maximum distances in bonded interactions: two-body bonded interactions: 0.429 nm, LJ-14, atoms 1837 1845 multi-body bonded interactions: 0.488 nm, CMAP Dih., atoms 440 449 Minimum cell size due to bonded interactions: 0.537 nm”
But the simulatoin never starts, as I get the fatal error: “1196 of the 49025 bonded interactions could not be calculated because some atoms involved moved further apart than the multi-body cut-off distance (0.835804 nm) or the two-body cut-off distance (1.59575 nm), see option -rdd, for pairs and tabulated bonds also see option -ddcheck”
I don’t understand how this can happen if the initial maximum distances are well below the cut-off distances. Could someone explain how I should interpret this error?
Log file: step5_6.log - Google Drive
Basic system info: protein-ligand complex in a rhombic dodecahedral box of ~409nm³, (box-X, box-Y, box-Z) ~= (8.33, 8.33, 5.89)nm. The system consists of 41672 atoms.
I have looked at the post, but it has not really helped. I can only use 16 processors for my system, on one node. WIth more than 16 processors, domain decomposition is tried, where I get the fatal errors.
I’m starting to think something must be wrong with my topology. However, I have tried the same simulation without the ligand, and with the protein topology generated by CHARMM-GUI (it also gives GROMACS compatible files), and I still get the fatal error about bonded interactions.
Do you know how it is possible that domain decomposition cannot even be initiated, even when GROMACS shows that the initial maximum distances are way below the cut-off distance?
For example, running with -ntmpi 1 -ntomp 16 will not give you domain decomposition. 16 OpenMP threads on one domain is likely less efficient than a system with domain decomposition, but it would let you figure out if your system explodes due to a topology problem (which I think is most likely here).
Also, you don’t need to use all the cores - if you wanted to just run on one core, -nt 1 -ntmpi 1 -ntomp 1 should work, again just for debugging. Note that if you don’t use all cores you should add -pin on -pinoffset 0 (or whatever CPU ID you want to pin to), or you’ll get a warning and likely performance degradation as your operating system bounces running Gromacs between the various cores of your computer.
I have ran the system for 100 ns without DD (using 9 or 16 processors), and the system does not blow up. I checked this visually in VMD, and by looking at parameters like temperature, pressure, residue-RMSF plots etc.
I have also toyed with the -ntmpi, -ntomp and -nmpi flags (on 4 nodes) to force GROMACS to use only 4 cells in DD (as that might satisfy the cut-offs), but GROMACS kept creating more than 4 ranks, and thus more than 4 domains.
A log file for gmx mdrun -nt 18 -deffnm step5_6 was given in my original post (step5_6.log - Google Drive)
GROMACS tells 9x9x8 is the maximum DD, then tries a 6x3x1 DD, but fails.
A log file for mpirun -np 4 gmx_mpi mdrun -ntomp 16 -deffnm step5_6: np4ppn16.log - Google Drive
This creates 64 ranks (48 for DD), and not 4 as specified with -np 4.
Same as before, GROMACS says that the maximum distance of bounded interactions is well below the cut-off, and tells that the maximum allowed number of cells is X 9 Y 9 Z 8. GROMACS tries a 4x4x3 DD, but a few lines GROMACS complains that 1306 bonded interactions could not be calculated because some atoms involved moved further apart than the cut-off distance…
My biggest problem remains the domain decomposition. I do not understand this error, it seems unlikely that my bonded atoms move that far apart in a single timestep of 2 fs. Any help would still be greatly appreciated.
Hi Wouter,
did you make any progress on this topic? I’m currently having similar issues also with a CHARMM-GUI generated topology. However, I ran my system first with fewer resources and it worked fine. I’m currently trying to continue my simulation, but I can’t get it to start if I request more resources (16 CPUs and 4 GPUs in my case).
Best regards,
Martin