GROMACS version: 2020.3
GROMACS modification: Yes
GROMACS-2020-RAMD produces strikingly different results if it is compiled with the built-in thread-MPI or with OpenMPI.
RAMD modified code is here:
We investigate a protein-drug complex trying to determine the average time for its dissociation. For this, we used GROMACS-2020-RAMD modification of GROMACS-2020 code that applies a random force to the ligand in a receptor-ligand complex. It aborts the run when the ligand detaches from the receptor. We simulated with two magnitudes of the force (250 and 400); at each force several runs (15 or 50, respectively) were done to collect good statistics. Each run was started from identical initial configuration with identical parameters except “ld-seed” and “ramd-seed” which were different. The quantity of interest was the run duration (that is the time needed for the complex dissociation).
The calculations were started on a cluster PC1, and then we employed also a cluster PC2 to speed up the work. However, comparison of the results between the machines showed a striking discrepancy of the run durations at both forces. Interestingly, for force 250, the PC1 showed longer times, while for force 400, the PC2 results were longer. To investigate the problem in more detail, we employed a third machine, PC3. Its results coincided with the ones of PC1.
After comparing the GROMACS build and run options between the three machines, we deduced the difference was caused by the MPI binding used: while on PC1 and PC3 it was the built-in thread MPI, on PC2 it was OpenMPI 3.1.6. In order to check this hypothesis, we installed a thread MPI version to PC2 and an OpenMPI version to PC3. Their outputs were similar to those of the thread MPI version on PC3/PC1 and the OpenMPI version on PC2, respectively. This led us to conclude that this is the source of the problem, and likely it is a software bug. We consider the thread MPI results to be correct and the OpenMPI ones to be erroneous, however, we have no solid arguments for this. Most surprisingly, if there is a bug in parallelization in either way, it is well reproducible.
The choice of ramd-seed values had a small effect on the results. Namely, at force 250 we performed a series of runs with identical ramd-seed (with OpenMPI on PC2 and PC3). The time values followed the trend of the ones obtained with individual seeds on PC2. Finally, we also checked the effect of compiler (GCC 8.3.0 vs. GCC 5.4.0 vs. Intel) and GPU (PC1 has GPU, while PC2 and PC3 have no), but no influence was found.
We have results as graphs, but overall the differences 2-3 orders of magnitude. We can send more simulation files upon request.