Change of protein-protein-interaction free energy upon ligand binding (procedure and controls)

GROMACS version: 2021.1
GROMACS modification: No

Dear Gromacs and simulation community,

in the course of my master’s thesis, I want to determine the protein-protein-binding free energy between two GTPases using PMF calculations via umbrella sampling. Especially I am interested in the contribution of the bound nucleotides (one per protein) which are also placed within the protein-protein interface. To do so I want to perform the simulations with the nucleotide loaded structures as well as after artificial deletion of the ligands from the input pdb files. My plan was to follow the overall procedure described in the corresponding GROMACS tutorial and [Lemkul JA, Bevan DR. Assessing the stability of Alzheimer’s amyloid protofibrils using molecular dynamics. J Phys Chem B. 2010 Feb 4;114(4):1652-60. doi: 10.1021/jp9110794. PMID: 20055378] as my proteins further oligomerize into filamentous structures which yields the symmetry allowing to define a proper one-dimensional pull and COM-distance determination axis for the steered MD simulations. In contrast to the paper, I am not restraining any of the proteins in the actual pull simulation.
So far, I setup everything to do an initial standard MD simulation (NPT + PBC) to relax the dimeric structures (after steepest descent minimization and NVT + NPT equilibration) until a stable RMSD is reached prior to the pull simulation. The resulting structures are introduced into the pulling simulation pipeline which consists again of NVT + NPT equilibration followed by the pulling simulations (NPT + PBC; so far I am pulling the COMs defined by the CA atoms of each protein by distance increase along the filament axis) to generate the input files which are then again NPT equilibrated for the actual umbrella sampling (NPT + PBC) and PMF calculation using WHAM.

As I am quite new to the field, I want to ask you which additional controls are necessary for such an approach, which non obvious pitfalls exist and if I already introduced some conceptual errors?

In the initial relaxation runs performed so far, my measure of convergence was a narrow backbone RMSD distribution around 0.15 nm in the last third of the simulation. Of course, I am always doing visual inspection and checking the minimal distance to the periodic image as well as evaluating stable temperature, pressure, energy etc. I also performed RMSD-based clustering looking for a reduction of accessed clusters over the time and I did PCA to check for well-defined minima in the free energy landscapes derived from the first three eigenvectors calculated over the full trajectories. For the pulling simulations I analyzed the behavior of the pinch of force under different selections for v and k choosing finally the minimal v which was not producing an observed artifact at lower v’s and a range of three different k’s. In the actual umbrella sampling simulations and WHAM, I was looking for proper overlap of the neighboring histograms and checked for convergence by producing the PMF curves over consecutive time windows until seeing a large overlap of the standard deviations derived from bootstrapping.

Thank you very much for your feedback!