GROMACS version: 2020.6
GROMACS modification: Yes
Hi everybody. Hope everyone is fine.
I’m facing an issue in my gromacs compilations for some time, but they were nothing to worry too much until now.
Every single one of my gromacs compilations (including he 2020.6 that I’m using for this report) on any multi-threaded cpu I have/had access only run on “pure openMP”: I have to manually set the command line with “-ntmpi 1”, effectivelly removing the thread-mpi, for it to work.
If I don’t set “-ntmpi 1”, independent of the system used I get errors as in the following example:
#########
hanging nstlist from 10 to 50, rlist from 1 to 1.107
Using 32 MPI threads
Using 1 OpenMP thread per tMPI thread
Not all bonded interactions have been properly assigned to the domain decomposition cells
A list of missing interactions:
Bond of 6583 missing 134
Angle of 22820 missing 651
Proper Dih. of 35458 missing 1539
Improper Dih. of 2682 missing 107
LJ-14 of 32910 missing 938
Molecule type 'Protein_chain_A'
the first 10 missing interactions, except for exclusions:
Proper Dih. atoms 60 62 64 66 global 60 62 64 66
Proper Dih. atoms 60 62 64 66 global 60 62 64 66
Proper Dih. atoms 60 62 64 66 global 60 62 64 66
LJ-14 atoms 60 66 global 60 66
Angle atoms 62 64 66 global 62 64 66
Proper Dih. atoms 62 64 66 67 global 62 64 66 67
Proper Dih. atoms 62 64 66 68 global 62 64 66 68
Proper Dih. atoms 62 64 66 69 global 62 64 66 69
LJ-14 atoms 62 67 global 62 67
LJ-14 atoms 62 68 global 62 68
Molecule type 'Protein_chain_E'
the first 10 missing interactions, except for exclusions:
Proper Dih. atoms 2288 2294 2296 2298 global 11794 11800 11802 11804
LJ-14 atoms 2288 2298 global 11794 11804
Angle atoms 2294 2296 2298 global 11800 11802 11804
Proper Dih. atoms 2294 2296 2298 2300 global 11800 11802 11804 11806
Proper Dih. atoms 2294 2296 2298 2300 global 11800 11802 11804 11806
Proper Dih. atoms 2294 2296 2298 2300 global 11800 11802 11804 11806
Proper Dih. atoms 2294 2296 2298 2308 global 11800 11802 11804 11814
Proper Dih. atoms 2294 2296 2298 2308 global 11800 11802 11804 11814
Proper Dih. atoms 2294 2296 2298 2308 global 11800 11802 11804 11814
Improper Dih. atoms 2294 2298 2296 2297 global 11800 11804 11802 11803
-------------------------------------------------------
Program: gmx mdrun, version 2020.6
Source file: src/gromacs/domdec/domdec_topology.cpp (line 421)
MPI rank: 0 (out of 32)
Fatal error:
3369 of the 181683 bonded interactions could not be calculated because some
atoms involved moved further apart than the multi-body cut-off distance
(1.31543 nm) or the two-body cut-off distance (1.31543 nm), see option -rdd,
for pairs and tabulated bonds also see option -ddcheck
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
#########
However, if I set the “-ntmpi 1”, it runs a “pure openMP” version and it runs ok, with no flaws whatsoever. :(
Does anybody have any idea on the possible causes for this error?
Additionally: I’ve never worried too much about that because the performance was good enough. However, now we are using a machine with a RTX3060 gpu: on it the process begins to run but, after 1-2 minutes, the whole machine turns off. I’m almost certain that it is not related to the “-ntmpi 1” issue, but it don’t hurt to ask if someone had this sort of “turning off computers during gpu calculations”.
Thanks a lot for any comments!