T-REMD simulations result in 5x performance

GROMACS version: 2023.4
GROMACS modification: No

Hello Gromacs forum members!

I’m working on some T-REMD simulations of an enzyme where I am using enhance sampling to sample different settled configurations of the active site. I haven’t run into any fatal errors with my simulations, however I did run into an oddity that I though the gromacs forum could shed some light onto.

I find that when I run equilibration simulations for each respective temperature replica I get extremely low performances (~20 ns/day for a system with 59000 atoms), however when I run the production simulation with replica exchanges I see a 5x performance improvement (~100 ns/day for the same system). Intuitively, I would think the simulations undergoing replica exchange would have a lower performance due to the overhead of exchanging replicas every so often, but with this system that doesn’t seem to be the case.

These T-REMD simulations are being run on Gromacs 2023.4 with MPI and CUDA support on a cluster with 8 NVIDIA Tesla V100. The equilibration and production simulations all share essentially the same MDP options, where the only major difference is the inclusion of the -nex and -replex flag for the replica exchange simulations. Here I’ve included one of my benchmarking plots that show’s the odd performance increase that comes from the T-REMD simulations. I’m wondering if this is known in the community, and if there is any work arounds to get the non-exchanging equilibration simulations to get similar performance as the T-REMD simulations.

I did a similar benchmarking experiment with the same simulations, but I included the following flags: -nex 0 -replex 100. My thought here was that the equilibration simulations will attempt exchanges every 100 steps, but won’t actually exchange configurations, just to see if there was something in the replica exchange code that was causing this performance increase. This gave a similar performance plots to the one shared above, where the exchanging simulation gets a massive performance increase. I also did repeated this benchmarking with the environment variable GMX_ENABLE_DIRECT_GPU_COMM set to true, which gave similar results. In all cases T-REMD simulations vastly out-perform the individual equilibration simulations.

Any ideas on why these T-REMD simulations see such a large performance increase?

Looking through the log files from these simulations it looks like the production REMD simulations offload the PME calculation to the GPUs, but the non-exchanging multi-simulation does not. Here’s an example of the GPU tasks for the REMD simulation.

Mapping of GPU IDs to the 128 GPU tasks in the 64 ranks on this node:
PP tasks will do (non-perturbed) short-ranged interactions on the GPU
PP task will update and constrain coordinates on the GPU
PME tasks will do all aspects on the GPU

For the non-REMD simulations GPU tasks are only setup for PP calculations.

Try specifying the mdrun options -pme gpu -update gpu and see if it helps. At least you should get a message why those options are not possible with your simulation (if they aren’t).