Is using GPU for mdrun `update` generally incompatible with PLUMED plugin?

GROMACS version: 2022.3-plumed_2.8.1
GROMACS modification: Yes, with plumed

I am testing an implementation of steered MD with GROMACS and PLUMED. I used alanine dipeptide for the test and tried to add time-dependent harmonic restraints to the Phi and Psi dihedral simultaneously, to induce a conformation change from (-1.5, 1.3) to (1.4, -1.0) in 100ps.

I used two ways to add such harmonic restraints, namely, GROMACS pull code and PLUMED MOVINGRESTRAINT, and plotted the CVs evolution along the trajectory respectively. What I found is that with all major computations performed on GPU (nb, pme, bonded, update), PLUMED generated noisy trajectories and almost failed to make that push. While if I used CPU to do the update and let GPU do the rest, or simply use CPU for all, PLUMED worked just fine(except for using CPU and large restraining constant kappa, where PLUMED warned about ‘relative constraint deviation after LINCS’ and ran into segmentation fault). In all circumstances, the GROMACS innate pull code works perfectly.
I have uploaded the trajectory plots of the two methods under different kappa as three figures. And it should be clear that PLUMED works abnormally when using GPU doing all jobs.

So does it mean when I use PLUMED for on-the-fly computation and operation, I should always use CPU for updates? Are there any other mdrun parameters I should pay attention to ensure PLUMED works correctly and efficiently? (other computations, ntomp, ntmpi, etc)

Hi,

I strongly advise that you raise this issue with the PLUMED developers.

I am not familiar with the PLUMED modifications, but if these were not developed in a way compatible with the heterogeneous paralellization GROMACS uses, unfortunately there can be (common or uncommon) cases where things will fail not work correctly.

Specifically, when the GPU-resident mode is used (that is -update gpu), simulation state is kept on the GPU for a number of MD steps and it is only up-to-date on the CPU side when necessary. Therefore, if an implementation did not account for this, it would be applying restraints to stale data on the CPU and/or data that is not used for the GPU-resident calculations.

Cheers,
Szilárd

Thanks for your reply! I think that makes a lot of sense.

Do you think if we implement some modifications to address this issue (maybe force data at CPU side to update every time step), the speed will still be a bit faster than simply doing all the updates with CPU (-update cpu)? If so, I will raise this issue to the PLUMED community later, otherwise I suppose simply prompting an error messege and forcing the users to use CPU for the update is OK.

Also, I highly suggest GROMACS release an official version of PLUMED-API-included source code with each update, or at least mention some tips and notes for the patching process in the manual. Currently, the communication seems to be unilateral, despite the truth that many (if not the most) PLUMED users use GROMACS as the MD engine. That would be much more friendly for us first-timers. :)
(As a comparison, Amber22 has its official implementation of PLUMED integration, and OpenMM also has a official PLUMED wrapper)

Do you think if we implement some modifications to address this issue (maybe force data at CPU side to update every time step), the speed will still be a bit faster than simply doing all the updates with CPU (-update cpu)?

I would not expect any major benefits. The main advantage of the GPU-resident mode is the lack of synchronization and data transfers, not offloading the computation cost of the integrator/constraints.

1 Like