Protein-ligand umbrella sampling binding affinity not matching experimental values

GROMACS version: 2022.1
GROMACS modification: No
OS: Red Hat Enterprise Linux

Hi all, I am currently running some protein-ligand umbrella sampling simulations to measure the binding affinity of methyl-fucose to bacterial lectin. I am using this paper as my reference experimental binding affinity (table IV) and have used PDB ID 2BT9 to obtain the protein/ligand structural info.

To calculate the binding energy (deltaG) I am following this protocol:

  • I have removed all water molecules from the PDB file (experimental results show crystallographic water is not involved in ligand binding) and parameterised the protein using GROMACS and the charmm36-jul2021 forcefield.
  • I have parametrised one of the methyl-fucose ligands using CGenFF, and then added the new parameters and atoms to the protein .gro file (similar to the process described here: Protein-Ligand Complex).
  • I have solvated the system using TIP3P water and ensured the system charge was overall neutral.
  • I have minimised and NVT/NPT equilibrated the complex.
  • I have conducted steered MD to pull the ligand out of its binding pocket. The params I have used are as below:
pull                    = yes
pull_ncoords            = 1         ; only one reaction coordinate 
pull_ngroups            = 2         ; two groups defining one reaction coordinate 
pull_group1_name        = binding_site ; residues close to the ligand binding site 
pull_group2_name        = MF2 ; ligand name
pull_coord1_type        = umbrella  ; harmonic potential
pull_coord1_geometry    = distance  ; simple distance increase 
pull_coord1_dim         = N N Y
pull_coord1_groups      = 1 2
pull_coord1_start       = yes       ; define initial COM distance > 0
pull_coord1_rate        = 0.01      ; 0.01 nm per ps = 10 nm per ns
pull_coord1_k           = 1000      ; kJ mol^-1 nm^-2
  • After this, I generate umbrella sampling starting configurations along the reaction coordinate (up to when the ligand is ~2.5nm away from the binding site).
  • I then run NPT equilibration and 10ns production md using the following pull params:
pull                    = yes
pull_ncoords            = 1         ; only one reaction coordinate
pull_ngroups            = 2         ; two groups defining one reaction coordinate
pull_group1_name        = MF2
pull_group2_name        = binding_site
pull_coord1_type        = umbrella  ; harmonic potential
pull_coord1_geometry    = distance  ; simple distance increase
pull_coord1_dim         = N N Y
pull_coord1_groups      = 1 2
pull_coord1_start       = yes       ; define initial COM distance > 0
pull_coord1_rate        = 0.0       ; restrain in place
pull_coord1_k           = 1000      ; kJ mol^-1 nm^-2
  • Finally, I combined the snapshot results using WHAM, generating a window histogram and energy profile of the results.

However, I am getting conflicting results from this. The binding affinity should be around 8kcal/mol (according to the experimental value) for each and every ligand (very identical binding profile). Pulling one particular ligand results in a very low binding affinity of ~2.2kcal/mol, while pulling another results in an extremely high value of around 16kcal/mol. Both histograms/profiles are attached here, along with one example pulling simulation. I also seem to not be able to properly sample the region where the ligand has just about left the binding pocket for the second, stronger-affinity ligand (clearly seen in the histogram/profile).

Does anyone know what I am missing here? Is there something that could be influencing the binding affinity in the way I have defined my simulations? I can provide further .mdp files/details if required.

Many thanks in advance for your help!

I have attached all files here as I can’t seem to attach media to the post directly.

Reducing protein-ligand binding to a one-dimensional reaction coordinate (and as you have, only one spatial dimension of a one-dimensional reaction coordinate) is almost certainly inappropriate. And I’ve seen some recent evidence that any one-dimensional umbrella sampling may not even be reproducible, for a variety of reasons. Ligand binding to proteins is complex (see, e.g. https://doi.org/10.1038/s41467-022-33104-3) and simply trying to compute a binding free energy from a single, chosen path may never be right.

I’m hoping you haven’t simply transported the .mdp files from my tutorial to your system, particularly the pull_coord1_dim = N N Y part, because that’s really not appropriate for basically anything outside the tutorial (or other trivial systems for which truly only one dimension should matter).

1 Like

Hi Justin,

Many thanks for your reply! Indeed, I am aware that protein-ligand binding can be very complex to model, but given that the binding pockets for this system are very shallow and the simple ligand structure, I thought that doing some US could be enough to obtain some rough approximations of the binding affinity. Do you recommend some other strategy? I was looking at other methods (such as BFEE2 - Binding affinity estimation from restrained umbrella sampling simulations | Nature Computational Science) but I feel like these are difficult to modify and seem to be geared towards more complex protein/ligand systems.

Yes I was using your tutorial in the first instance to help set up my system. I was unaware that the N N Y dimension pulling would be inappropriate here (I basically rotated my protein-ligand complex to have the ligand face towards the z-direction, then pulled out the ligand), although I now see why this doesn’t make sense to apply here. I have returned this to its default (Y Y Y) and indeed one of the ligands (the weaker one) appears to show an improved binding accuracy (7-8 kcal/mol). However, the other ligand (in between the monomer chains) is still being reported to have a high binding affinity (but with a smoother energy profile).

What other parameters would you recommend I adjust/experiment with in the pulling code? I have taken a look at the rate/spring constant to see if they are too high, but I do not detect any irrational behaviour during pulling - the spring force rises until the ligand is pulled free from the picket, after which it remains low (which I have observed in other protein/ligand systems too). Adjusting the geometry/type parameters doesn’t seem likely to be beneficial in this scenario to me. Could the issue be the choice of residues I have selected for the binding site atom group?

Thanks again for your help here - much appreciated!

I can’t even hazard a guess about the aberrant results, but I know that umbrella sampling has issues with 1D reaction coordinates in general, and even if you run the “successful” simulation again, you may get an entirely different result. I can’t say too much since what I know is unpublished, but should be out in the world soon.

Metadynamics or some other enhanced sampling method may be more appropriate. There are often factors at work that are very difficult to anticipate, even for a “simple” or “straightforward” system.

1 Like