MPI error when using ensemble-averaged distance restraints and set GMX_DISRE_ENSEMBLE_SIZE

GROMACS version: 2021.4
GROMACS modification: No

Dear Gromacs community,

I would like to run an ensemble-averaged restrained MD simulation using NMR distance restraints and 10 replicas for ensemble-averaging. All 10 systems are in subdirectories md01 - md10 and contain the same number of atoms and topology. The distance restraints file (same format as in the manual, starts with index 0 and with distances in nm) is referenced in the topology file and in the mdp file, I enabled them using

define = -DDISRES_NOE
disre = ensemble
disre_weighting = equal
disre_mixed = no
disre_fc = 200
disre_tau = 0
nstdisreout = 5000

I generate a tpr file in each md* subdirectory using gmx grompp, set the ensemble size with the environment variable export GMX_DISRE_ENSEMBLE_SIZE="10" and then start the simulation with

mpirun -np 10 gmx_mpi mdrun -ntomp $OMP_NUM_THREADS -pin on -pinstride 1 -deffnm simulationXY -multidir md0* md1*

My problem is that if I set the environment variable GMX_DISRE_ENSEMBLE_SIZE, I always get the following MPI error directly at the beginning of the simulation:

[cgpu01-003:03913] *** An error occurred in MPI_Bcast
[cgpu01-003:03913] *** reported by process [2687500289,0]
[cgpu01-003:03913] *** on communicator MPI_COMM_WORLD
[cgpu01-003:03913] *** MPI_ERR_COMM: invalid communicator
[cgpu01-003:03913] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[cgpu01-003:03913] ***    and potentially your MPI job)
[cgpu01-003:03895] 9 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[cgpu01-003:03895] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

In general, running Gromacs in parallel using MPI works fine for me, also using multiple replicas set with the -multidir option. Without setting the environment variable GMX_DISRE_ENSEMBLE_SIZE, also everything works fine without errors (distance restraints are also recognised), but then also no ensemble-averaging is applied, I think. I already tried out using different nodes (with and without GPU) and other (newer) Gromacs versions, but I always get the same error and cannot explain, what I’m doing wrong.

Thank you for your help!

This works for my test system in the 2025 version. Could you try version 2025?

Hi,
I just tried it also with Gromacs 2025.4, but unfortunately, I still get the same error.

What are the last lines you see in the log file?

The list of input parameters

...
     nnpot:
       active                     = false
       modelfile                  = model.pt
       input-group                = System
       model-input1               = 
       model-input2               = 
       model-input3               = 
       model-input4               = 
grpopts:
   nrdf:     14492.8      198225
   ref-t:         300         300
   tau-t:         0.1         0.1
annealing:          No          No
annealing-npoints:           0           0
   acc:	           0           0           0
   nfreeze:           N           N           N
   energygrp-flags[  0]: 0

so directly before the distance restraints should be recognised (the following message does NOT anymore appear in the log file):

Initializing the distance restraints
Found GMX_DISRE_ENSEMBLE_SIZE set to 10 systems per ensemble
Multi-checking the number of systems per ensemble ... OK
Our ensemble consists of systems: 0 1 2 3 4 5 6 7 8 9
There are 436 distance restraints involving 1308 atom pairs
Multi-checking the number of distance restraints ... OK

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
...

If I don’t set the environment variable GMX_DISRE_ENSEMBLE_SIZE, only

Initializing the distance restraints
There are 436 distance restraints involving 1308 atom pairs

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
...

is written to the log file but the remaining lines are missing.

Can you easily share you tpr file, so I can try to reproduce this?

Yes, thanks for your help. Strangely, tpr files are no allowed/authorised file types for uploading to the forum via the upload button, so I changed the extension to txt. I hope, I didn’t break any forum rules by doing so.

npt_1_posre_fc_1000_disre_fc_200_orire_fc_0.1_change_extension_to_tpr.txt (4.5 MB)

I don’t get an MPI error, but a GROMACS error:

Multi-checking the number of fit atoms for orientation restraining ...
the number of fit atoms for orientation restraining is not equal for all subsystems
subsystem 0: 711
subsystem 1: 0
subsystem 2: 0
subsystem 3: 0

-------------------------------------------------------
Program: gmx mdrun, version 2027.0-dev-20260120-96b77326ef
Source file: src/gromacs/mdrunutility/multisim.cpp (line 310)
MPI rank: 0 (out of 4)

Fatal error:
The 4 subsystems are not compatible

I dont’t understand how you get an MPI error. But to fix this I had to add a few lines, listed below. Could you try this?

--- a/src/gromacs/listed_forces/orires.cpp 
+++ b/src/gromacs/listed_forces/orires.cpp 
@@ -286,6 +286,10 @@ t_oriresdata::t_oriresdata(FILE*                     fplog, 
            refCoord -= com; 
        } 
    } 
+    else 
+    { 
+        referenceCoordinates_.resize(fitMasses_.size(), { 0.0_real, 0.0_real, 0.0_real }); 
+    } 
 
    const size_t numFitAtoms = referenceCoordinates_.size(); 
    xTmp_.resize(numFitAtoms);

Hi,

did you also set the environment variable GMX_DISRE_ENSEMBLE_SIZE? I only get the MPI error when I set this environment variable, regardless of whether I only use distance restraints or (in addition to to distance restraints) also apply position and orientation restraints.

If I do not set the environment variable, I don’t get an MPI error and both, Gromacs 2021.4 and the modified version of Gromacs 2025.4 (with the fix you suggested) also don’t show any error related to orientation restraints or any other error, but run normal. The unmodified Gromacs 2025.4 version however shows the same error as for you.

Here is a tpr file for comparison where only distance restraints are applied:
p38a_md_disres_fc_1000_change_extension_to_tpr.txt (4.0 MB)

I found out that I had trouble passing the env.var. Now I managed and mdrun runs fine for me. I only tried the main branch, which is equal to release-2026 for this functionality. I now also tried release-2025 and there I get your error. So changing to 2026 will solve your issue (and you still need the orires fix for orientation restraints).

Thank you very much! Now it’s working :)