Major difference between thread-MPI and OpenMPI versions of GROMACS modification GROMACS-2020-RAMD

GROMACS version: 2020.3
GROMACS modification: Yes

The problem:

GROMACS-2020-RAMD produces strikingly different results if it is compiled with the built-in thread-MPI or with OpenMPI.
RAMD modified code is here:

The details:

We investigate a protein-drug complex trying to determine the average time for its dissociation. For this, we used GROMACS-2020-RAMD modification of GROMACS-2020 code that applies a random force to the ligand in a receptor-ligand complex. It aborts the run when the ligand detaches from the receptor. We simulated with two magnitudes of the force (250 and 400); at each force several runs (15 or 50, respectively) were done to collect good statistics. Each run was started from identical initial configuration with identical parameters except “ld-seed” and “ramd-seed” which were different. The quantity of interest was the run duration (that is the time needed for the complex dissociation).

The calculations were started on a cluster PC1, and then we employed also a cluster PC2 to speed up the work. However, comparison of the results between the machines showed a striking discrepancy of the run durations at both forces. Interestingly, for force 250, the PC1 showed longer times, while for force 400, the PC2 results were longer. To investigate the problem in more detail, we employed a third machine, PC3. Its results coincided with the ones of PC1.

After comparing the GROMACS build and run options between the three machines, we deduced the difference was caused by the MPI binding used: while on PC1 and PC3 it was the built-in thread MPI, on PC2 it was OpenMPI 3.1.6. In order to check this hypothesis, we installed a thread MPI version to PC2 and an OpenMPI version to PC3. Their outputs were similar to those of the thread MPI version on PC3/PC1 and the OpenMPI version on PC2, respectively. This led us to conclude that this is the source of the problem, and likely it is a software bug. We consider the thread MPI results to be correct and the OpenMPI ones to be erroneous, however, we have no solid arguments for this. Most surprisingly, if there is a bug in parallelization in either way, it is well reproducible.

The choice of ramd-seed values had a small effect on the results. Namely, at force 250 we performed a series of runs with identical ramd-seed (with OpenMPI on PC2 and PC3). The time values followed the trend of the ones obtained with individual seeds on PC2. Finally, we also checked the effect of compiler (GCC 8.3.0 vs. GCC 5.4.0 vs. Intel) and GPU (PC1 has GPU, while PC2 and PC3 have no), but no influence was found.

We have results as graphs, but overall the differences 2-3 orders of magnitude. We can send more simulation files upon request.

Thank you!

Hi,

Since this is a third-party modified/derived version, contact the author of that version!

If you experience discrepancies between MPI and threads in the default/vanilla version of GROMACS, please do file a bug report and we’ll look into it right away.

Cheers,

Erik

Thank you Erik. Yes, we have contacted the authors. But this looks like an original GROMACS bug. How do I file a bug report?

Dmitry.

Bugs can be opened at https://gitlab.com/gromacs/gromacs/-/issues

Please provide us with a small example system that causes the issue,
together with build and run logs.

Cheers

Paul

PS: if you believe it’s a bug in vanilla GROMACS, start by reproducing it there - since that’s going to be the first request you get when filing a bug report ;-)

This has already been investigated here
(https://gitlab.com/gromacs/gromacs/-/issues/3735) and we found that is
not an issue with vanilla GROMACS.

Cheers

Paul