Insertion or deletion of residue in Topology?

GROMACS version:2021.5
GROMACS modification: No
I have run alchemical transformation to study the stability of proteins as effect of mutation by calculating the delta delta G (dd G) using REST2/FEP technique.
To do so, I prepared dual .top file first which contains the info. of the mutant residue in addition to the wild type residues.
In this case, first replica the wild type form while the last replica is the same structure with one mutant residue.
Now I am looking for applying the same method to study the effect (based on the dd G value ) of stability of the proteins due to insertion or delation of one or two resides.
I looked for any article about using this based on alchemical transformation I couldn’t find!!
Anyway, if you have any idea how to prepare the top file in GROMACS to be able to run REST with. different structures to get first replica the wild type form while the last replica should be the same structure after deletion or inserted the residue then I can use BAR(FEP) to get the dd G between these two forms

Unless you want to insert or delete the residues at or very close to one of the termini, this can become a very tricky problem. Though I never tried to run such simulations myself, I see two approaches you may try, which both come with significant disadvantages, unfortunately:

  1. Very likely to work, but probably a performance nightmare.
    From the insertion/deletion point towards the closer terminus, you include all atoms in the FEP region such that you can alchemically transform all these amino acids into the amino acids they are in the insertion/deletion mutant. This can in principle be done with the very same protocol as you used before since you encode the insertion/deletion as a series of point mutations. However, the FEP region becomes very large and the simulations thus very slow.

  2. Same performance as a single point mutation, but is likely to have convergence issues/spurious free energy contributions from the large-scale protein movements required.
    You just turn the residue you want to insert/delete into a residue of dummy atoms in the respective state. (It’s technically just a single point mutation like in your previous setup.) Moreover, you identify the bonds, angles and dihedrals that are formed once the residue to be deleted is gone. These bonded interactions get a force constant of zero as long as the residue to be deleted is still around and are turned on once the residue to be deleted has been annihilated. This solution is computationally much cheeper, but you will have to play around with the lambda vector settings in the MDP file. In particular, consider to fully convert Coulomb and van der Waals interactions before changing the bonded terms to avoid forcing neighbouring residues to crash into the residue to be deleted. And if you manage to get this going, this approach requires both parts of the protein to move to fill the gap left by the residue to be deleted. Therefore, I’d assume that this protocol can only work for an isolated protein in aqueous solution. And it might be needed to let the dummy atoms of the backbone of the residue to be deleted still have repulsive vdW interactions with the water molecules to prevent them from moving between the two protein parts as you annihilate the residue. When trying this approach, I urgently recommend consistency checks like: Do I get the same free energy if start the simulation from the wildtype and from the insertion/deletion mutant? It really has to be validated that you don’t get spurious free energy contributions from forcing the two parts of the protein to move to fill the gap left by the residue to be deleted.

Thank you too much for your valuable suggestions, I am so sorry for long time not replying to such great ideas.
Actually, things are running in the point 2 you suggested. but there is something to declare regarding lambda you said “you will have to play around with the lambda vector settings in the MDP file” actually we have optimizer that get things running on the right way now.
For now, the predicted ddG is +2 or -2 kcal/mol related to wt in the addition cases and sometimes +3 in deletion cases. I know the main issue here is the starting structure ( for wt I am using crystal structure but for add/delete I used alphafold to predict the structure) do have any suggestion to improve the performance? I am playing now with different ideas related to initial structure

No worries, I know very well that it takes a long time to test and get such simulation protocols to work.

Even better, finding the optimal spacing can be tedious unless you have some automated procedure or use sampling techniques that can compensate for non-optimal lambda spacings. However, I should maybe add here that my comment was not only about the actual spacing, but I also don’t know whether the residue to be deleted has to be completely annihilated before switching on the new bonds, angles and dihedrals to be formed or whether you can already start switching on the new bonded interactions while the residue to be deleted is still, e.g., 10% there.

I unfortunately can’t make too much sense of the numbers you’re providing. As you use FEP, the key check I had in mind is the following, using a deletion mutation as an example:

  1. Start from the wildtype structure and use the fast growth setup (delta-lambda in the mdp file) to generate starting structures for all lambda points, then run FEP/REST2.
  2. Start from the deletion mutant and use the fast growth setup to generate starting structures for all lambda points, then run FEP/REST2.
  3. Check that you get the same free energy difference within statistical error, i.e. you see no hysteresis. If you get significantly different free energies, extend the simulations until you converge to the same free energy, regardless of how you created the starting structure. If you can never converge to the same free energy, either exclude this particular mutation or improve the simulation protocol.

This brings me to your last question, how to improve the performance. Your choices for the starting structures seem very reasonable to me. If you haven’t done so, feel free to play around with the heated REST2 region and the interactions you scale. For instance, if the insertion/deletion causes the local secondary structure to change, you may try to scale the Lennard-Jones parameters describing the interaction of the residues involved with the surrounding water molecules. As proteins fold due to the hydrophobic effect in many cases, making the interactions with water more favourable at the unphysical lambda values can cause your protein to locally unfold and re-fold in the new secondary structure as you move along lambda.
If you can figure out a few good collective variables describing the structural changes needed to go from the wildtype to the mutant structure, you can change the value of this collective variable as you go along lambda using the pull code. As far as I know, the internal implementation of replica exchange in GROMACS is based on lambda, and lambda can’t yet control the targeted value of a pull code coordinate; you may need to use an external patch like PLUMED to run this kind of replica exchange then.

Unfortunately, some of the more advanced enhanced sampling/free energy methods still need to be implemented in GROMACS. Feel free to have a look at Umbrella issue for multidimensional lambda design (#4014) · Issues · GROMACS / GROMACS · GitLab if you’re interested in what the development plans are.

Thanks too much, I though also that REST2 region could have some effect but still don’t try to change ( current 4 A front the residue add/delete).
I will keep trying with such suggestions and see what analyses can provide us with to improve the convergence of ddG.

Hi @marzouk, I’m super interested in insertion/deletion FEPs and I’m curious about any updates on your efforts! Are you aiming for a publication at some point this year?