# Free energy calculation of two molecules in water

GROMACS version: 2018.7
GROMACS modification: No
Dear all,

I am calculating the free energy of adsorption of a cellulose-hemicellulose complex. The system is consisted of a cellulose crystal on which 4 hemicellulose polymers are positioned. The system is solvated in water. I am using @jalemkul 's tutorial steps. My apologies if the questions are intuitive.

1- Can I use finer lambda spacing where changes are steeper, independently? E.g. only simulate 0.5-0.8 with finer lambda and replace the results of the same band with larger spacing when using BAR? The intention is only not to repeat all the other steps i.e. 0-0.5 and 0.8-1. Does it work this way?

2- As I am only interested in the free energy between cellulose and hemicellulose (\deltaG), shall I perform two sets on independent simulations: one being cellulose-hemicellulose-water (\delta G1) and repeat the same calculation for hemicellulose-water (\deltaG2)? then \deltaG = \deltaG1 - \deltaG2 ?

3- I have four hemicellulose polymers in the system. In case the answer to the second question is YES, is it correct to only have one hemicellulose polymer in water, and \deltaG2 would be four times the free energy calculated for this system?

Kind regards,
Ali

1 - yes you should use closer lambda values when it is changing quickly. Otherwise the Gibbs energy value you obtain can be “incorrect”.

Maybe to better ask the question:
can I simply only simulate the regions where energy is changing quickly, and combine the results with the other previously modelled lambdas?
I mean if 0.5-0.8 would be the 30th-36th steps, then with a calc_lambda_neighbors=1, I only re-simulate steps 29th to 37th, and use the results in combination with the other coarse-lambda sampled for 0-0.5 and 0.8 onward?

I would not attempt to compute ΔG for this system using an alchemical approach. It’s unlikely that you’ll ever get it to converge. Try umbrella sampling, instead.

Dear Dr. Lemkul,

Thank you for your response and suggestion. The reason I was doing it this way was that I found this work using GROMACS for Furosemide-cellulose system and they apparently were able to validate their simulations with the experiments as well. I thought maybe the same methodology could be applied for my system as well. what can be the cause of the divergence?
I will start trying umbrella sampling then.
Kind regards,
Ali

Interactions between a macromolecule and a small molecule like furosemide are reasonable for alchemical perturbations. Changes to the system have to be “small” for such techniques to work.

The interaction between two polymers is much more substantial, and you will suffer from inadequate sampling of these large species. This is why you won’t ever see a protein-protein binding free energy determined via BAR or similar approaches. It’s always PMF.

Thank you again Dr. Lemkul. I see the point. The hemicellulose polymer in my system is rather long (a 4-5 nm chain), which can fluctuate hugely when the interactions are scaled down. I would start the umbrella sampling right away then.
Kind regards,
Ali

Dear Dr. Lemkul,
As suggested by you, I tried to use Umbrella Sampling to get the free energy for my cellulose-hemicellulose system. Following the procedure employed in your tutorial, the histogram and the PMF look like the following:

Even though I have pulled the hemicellulose for a distance of ca. 5.5 nm, I did not get a plateau at the end of the PMF. Does it mean that more snapshots are required after \xi = 7.5 nm? There must be very low interactions at that point.

Kind regards,
Ali

Looks like you need more sampling time, and hopefully your restraint is applied in all three dimensions, not just z (which is somewhat unique to the tutorial). The distance you need to pull is based on the properties of the molecules being separated; you need to ensure that at the longest windows, the two species do not have any atom pairs within the longest nonbonded cutoff (i.e. non-interacting). Then, you need sufficient sampling time in each window to adequately sample accessible conformations. For sugars and related polymers, achieving convergence could take quite a long time.

Dear Dr. Lemkul,
Do you mean the restraints on the first pulling reference (chain_B in your tutorial)? I am restraining the first reference (being cellulose) in all three dimensions, whereas hemicellulose is being pulled along z direction through its COM. no restraints is applied on the hemicellulose during umbrella sampling, but cellulose is still restrained. Should I apply any other restraints?
At the last window, the distance between the two molecules seems to be more than 4 nm (occasionally 3.5 nm), which is indeed higher than the longest nonbonded cutoff of 1.4 nm.
My models usually tend to equilibrate within a couple of ns. Simulations have been equilibrated for 10 ns at each snapshot. Using results of the last 7 ns, PMF was not different with that of the last 5 ns as well, which can be a sign of the equilibrium.
I’m including the first and last snapshots, to have a clue of the system and its size through Dropbox.
Looking at the trajectory I noticed huge perturbations in the box size after around 3 ns. This is for sure perturbations in the pressure, but is it because of rotations of the hemicellulose?
I can add some more windows, maybe another couple of nm. I don’t know if this could be what is needed last.
Kind regards,
Ali

Don’t use position restraints and your pull vector should be in all three dimensions. The system in the tutorial is somewhat unique - I have posted on this many times and the article associated with the tutorial has more detail, as does https://www.livecomsjournal.org/article/5068-from-proteins-to-perturbed-hamiltonians-a-suite-of-tutorials-for-the-gromacs-2018-molecular-simulation-package-article-v1-0

Dear Dr. Lemkul,
I have read the relevant part thoroughly, and this was where I could not understand.
Do you mean no position restraints during the umbrella sampling? I need the restraint to avoid cellulose floating in the box, as I needed to keep the box small. Otherwise I would need a very large box in both x and y directions. And by pull vector, do you mean the one of umbrella sampling (10 ns equilibration I mean)?

You should not be using position restraints, especially for such a flexible molecule. Your free energy surface will be inaccurate.

For the pull vector, I mean pull-coord1-dim = N N Y like in the tutorial is inappropriate, it should be pull-coord1-dim = Y Y Y because you have no basis to assume the association is unidirectional. Your system has spherical symmetry, not linear.

Thank you again Dr. Lemkul for your kind response.
The restraint was only applied on the cellulose (the stationary molecule), as we have seen that it is quite rigid, and I thought it would be a nice idea to make the simulation box small. Yet, the flexible molecule (being the pulled hemicellulose) was not restrained. So should I remove all the restraints in the system? Does it also matter for the non-flexible cellulose crystal?
As for the pull vector, my doubt is whether it is also important what reaction coordinate I use to generate the snapshots? I mean, can I only use pull-coord1-dim = N N Y when pulling the hemicellulose to generate the initial configurations, and then during umbrella sampling use pull-coord1-dim = Y Y Y?

I apologize for the numerous questions, and appreciate your help.
Kind regards,
Ali

Yes, my advice is, and has been, to not use any form of position restraints.

The pull vector should be the same in every process. The COM separation is what is appropriate for your system, not the z-component only of that COM separation.

1 Like

Dear Dr. Lemkul,
Thank you again for your kindness.
I will set up the simulations this way then.
Kind regards,
Ali

Dear Dr. Lemkul,

Following your suggestions, I corrected the way the umbrella sampling is done, i.e. no restrains, and SMD in xyz, as well as the sampling in xyz. The PMF curve for most of the configurations seem to be reasonable, but in one of them seems to fluctuate in some regions.
The force constant was not too high to eliminate any movements around the COM-COM distance, and not to low to have flatter distributions. Also checking the histogram, I see that the histograms cover each other mostly, at least marginally. So I did not find any bare patches.
What do you think can be the main reason of these fluctuations? I cannot get a smooth PMF as you have in your tutorial.
How much do you think these can affect the computed free energy?

Kind regards,
Ali

Have you done error analysis or a convergence assessment? Most PMFs are not going to be completely smooth, and your result looks fairly typical.

Dear Dr. Lemkul,

Thank you for your suggestion. I haven’t yet, but I will do it right away.

Kind regards,
Ali

Dear Dr. Lemkul,

Checking the convergence for the models, I noticed that the more I drop the first couple of ns, the more agitated the figures become. I mean, values do not converge by moving to the end of the simulation.
I have equilibrated each window for 10 ns. Here is the PMF. Legend shows the amount of dropped ps.

Checking the first three PMF and comparing it with the last one it is obvious that some samplings apparently vanish if too much of the equilibration sampling is dropped out.

Of course, this is not true for all the models. Some really do converge after a couple of ns.

Does it mean that for this particular model for instance, I need to only drop the first ns? How can I make sure that the models are converged in this case?

Kind regards,
Ali