Free energy calculation of two molecules in water

Ali · July 26, 2020, 9:33pm

GROMACS version: 2018.7
GROMACS modification: No
Dear all,

I am calculating the free energy of adsorption of a cellulose-hemicellulose complex. The system is consisted of a cellulose crystal on which 4 hemicellulose polymers are positioned. The system is solvated in water. I am using @jalemkul 's tutorial steps. My apologies if the questions are intuitive.

1- Can I use finer lambda spacing where changes are steeper, independently? E.g. only simulate 0.5-0.8 with finer lambda and replace the results of the same band with larger spacing when using BAR? The intention is only not to repeat all the other steps i.e. 0-0.5 and 0.8-1. Does it work this way?

2- As I am only interested in the free energy between cellulose and hemicellulose (\deltaG), shall I perform two sets on independent simulations: one being cellulose-hemicellulose-water (\delta G1) and repeat the same calculation for hemicellulose-water (\deltaG2)? then \deltaG = \deltaG1 - \deltaG2 ?

3- I have four hemicellulose polymers in the system. In case the answer to the second question is YES, is it correct to only have one hemicellulose polymer in water, and \deltaG2 would be four times the free energy calculated for this system?

Thank you in advance for your responses,
Kind regards,
Ali

Dr_DBW · July 26, 2020, 11:08pm

1 - yes you should use closer lambda values when it is changing quickly. Otherwise the Gibbs energy value you obtain can be “incorrect”.

Ali · July 26, 2020, 11:36pm

Thank you Dr DBW for your answer to my first question.
Maybe to better ask the question:
can I simply only simulate the regions where energy is changing quickly, and combine the results with the other previously modelled lambdas?
I mean if 0.5-0.8 would be the 30th-36th steps, then with a calc_lambda_neighbors=1, I only re-simulate steps 29th to 37th, and use the results in combination with the other coarse-lambda sampled for 0-0.5 and 0.8 onward?

jalemkul · July 27, 2020, 4:12pm

I would not attempt to compute ΔG for this system using an alchemical approach. It’s unlikely that you’ll ever get it to converge. Try umbrella sampling, instead.

Ali · July 27, 2020, 4:40pm

Dear Dr. Lemkul,

Thank you for your response and suggestion. The reason I was doing it this way was that I found this work using GROMACS for Furosemide-cellulose system and they apparently were able to validate their simulations with the experiments as well. I thought maybe the same methodology could be applied for my system as well. what can be the cause of the divergence?
I will start trying umbrella sampling then.
Kind regards,
Ali

jalemkul · July 27, 2020, 4:53pm

Interactions between a macromolecule and a small molecule like furosemide are reasonable for alchemical perturbations. Changes to the system have to be “small” for such techniques to work.

The interaction between two polymers is much more substantial, and you will suffer from inadequate sampling of these large species. This is why you won’t ever see a protein-protein binding free energy determined via BAR or similar approaches. It’s always PMF.

Ali · July 27, 2020, 4:57pm

Thank you again Dr. Lemkul. I see the point. The hemicellulose polymer in my system is rather long (a 4-5 nm chain), which can fluctuate hugely when the interactions are scaled down. I would start the umbrella sampling right away then.
Kind regards,
Ali

Ali · August 2, 2020, 8:34pm

Dear Dr. Lemkul,
As suggested by you, I tried to use Umbrella Sampling to get the free energy for my cellulose-hemicellulose system. Following the procedure employed in your tutorial, the histogram and the PMF look like the following:

Even though I have pulled the hemicellulose for a distance of ca. 5.5 nm, I did not get a plateau at the end of the PMF. Does it mean that more snapshots are required after \xi = 7.5 nm? There must be very low interactions at that point.
May I have your opinion about this? Shall I modify anything?

Kind regards,
Ali

jalemkul · August 3, 2020, 12:35am

Looks like you need more sampling time, and hopefully your restraint is applied in all three dimensions, not just z (which is somewhat unique to the tutorial). The distance you need to pull is based on the properties of the molecules being separated; you need to ensure that at the longest windows, the two species do not have any atom pairs within the longest nonbonded cutoff (i.e. non-interacting). Then, you need sufficient sampling time in each window to adequately sample accessible conformations. For sugars and related polymers, achieving convergence could take quite a long time.

Ali · August 3, 2020, 1:24am

Dear Dr. Lemkul,
Thank you for your response.
Do you mean the restraints on the first pulling reference (chain_B in your tutorial)? I am restraining the first reference (being cellulose) in all three dimensions, whereas hemicellulose is being pulled along z direction through its COM. no restraints is applied on the hemicellulose during umbrella sampling, but cellulose is still restrained. Should I apply any other restraints?
At the last window, the distance between the two molecules seems to be more than 4 nm (occasionally 3.5 nm), which is indeed higher than the longest nonbonded cutoff of 1.4 nm.
My models usually tend to equilibrate within a couple of ns. Simulations have been equilibrated for 10 ns at each snapshot. Using results of the last 7 ns, PMF was not different with that of the last 5 ns as well, which can be a sign of the equilibrium.
I’m including the first and last snapshots, to have a clue of the system and its size through Dropbox.
Looking at the trajectory I noticed huge perturbations in the box size after around 3 ns. This is for sure perturbations in the pressure, but is it because of rotations of the hemicellulose?
I can add some more windows, maybe another couple of nm. I don’t know if this could be what is needed last.
Kind regards,
Ali

jalemkul · August 3, 2020, 1:36am

Don’t use position restraints and your pull vector should be in all three dimensions. The system in the tutorial is somewhat unique - I have posted on this many times and the article associated with the tutorial has more detail, as does https://www.livecomsjournal.org/article/5068-from-proteins-to-perturbed-hamiltonians-a-suite-of-tutorials-for-the-gromacs-2018-molecular-simulation-package-article-v1-0

Ali · August 3, 2020, 1:48am

Dear Dr. Lemkul,
I have read the relevant part thoroughly, and this was where I could not understand.
Do you mean no position restraints during the umbrella sampling? I need the restraint to avoid cellulose floating in the box, as I needed to keep the box small. Otherwise I would need a very large box in both x and y directions. And by pull vector, do you mean the one of umbrella sampling (10 ns equilibration I mean)?

jalemkul · August 3, 2020, 12:22pm

You should not be using position restraints, especially for such a flexible molecule. Your free energy surface will be inaccurate.

For the pull vector, I mean pull-coord1-dim = N N Y like in the tutorial is inappropriate, it should be pull-coord1-dim = Y Y Y because you have no basis to assume the association is unidirectional. Your system has spherical symmetry, not linear.

Ali · August 3, 2020, 2:14pm

Thank you again Dr. Lemkul for your kind response.
The restraint was only applied on the cellulose (the stationary molecule), as we have seen that it is quite rigid, and I thought it would be a nice idea to make the simulation box small. Yet, the flexible molecule (being the pulled hemicellulose) was not restrained. So should I remove all the restraints in the system? Does it also matter for the non-flexible cellulose crystal?
As for the pull vector, my doubt is whether it is also important what reaction coordinate I use to generate the snapshots? I mean, can I only use pull-coord1-dim = N N Y when pulling the hemicellulose to generate the initial configurations, and then during umbrella sampling use pull-coord1-dim = Y Y Y?

I apologize for the numerous questions, and appreciate your help.
Kind regards,
Ali

jalemkul · August 3, 2020, 4:43pm

Yes, my advice is, and has been, to not use any form of position restraints.

The pull vector should be the same in every process. The COM separation is what is appropriate for your system, not the z-component only of that COM separation.

Ali · August 3, 2020, 6:09pm

Dear Dr. Lemkul,
Thank you again for your kindness.
I will set up the simulations this way then.
Kind regards,
Ali

Ali · August 27, 2020, 3:56pm

Dear Dr. Lemkul,

Following your suggestions, I corrected the way the umbrella sampling is done, i.e. no restrains, and SMD in xyz, as well as the sampling in xyz. The PMF curve for most of the configurations seem to be reasonable, but in one of them seems to fluctuate in some regions.
The force constant was not too high to eliminate any movements around the COM-COM distance, and not to low to have flatter distributions. Also checking the histogram, I see that the histograms cover each other mostly, at least marginally. So I did not find any bare patches.
What do you think can be the main reason of these fluctuations? I cannot get a smooth PMF as you have in your tutorial.
How much do you think these can affect the computed free energy?

Thank you in advance for your response.

Kind regards,
Ali

jalemkul · August 27, 2020, 4:52pm

Have you done error analysis or a convergence assessment? Most PMFs are not going to be completely smooth, and your result looks fairly typical.

Ali · August 27, 2020, 6:07pm

Dear Dr. Lemkul,

Thank you for your suggestion. I haven’t yet, but I will do it right away.

Kind regards,
Ali

Ali · August 27, 2020, 9:44pm

Dear Dr. Lemkul,

Checking the convergence for the models, I noticed that the more I drop the first couple of ns, the more agitated the figures become. I mean, values do not converge by moving to the end of the simulation.
I have equilibrated each window for 10 ns. Here is the PMF. Legend shows the amount of dropped ps.

Checking the first three PMF and comparing it with the last one it is obvious that some samplings apparently vanish if too much of the equilibration sampling is dropped out.

Of course, this is not true for all the models. Some really do converge after a couple of ns.

Does it mean that for this particular model for instance, I need to only drop the first ns? How can I make sure that the models are converged in this case?

Kind regards,
Ali

Topic		Replies	Views
Free energy calc gives "WARNING: Some of these results violate the Second Law of Thermodynamics" User discussions mdp-parameters , analysis-tools , free-energy , simulation-setup	5	1537	September 6, 2022
Free Energy Calculation general questions and LINCS Warning User discussions	10	1478	January 13, 2025
Umberlla sampling for protein ligand complex User discussions	31	201	August 5, 2025
Umbrella sampling does not generate PMF results User discussions	3	484	August 23, 2024
Assessing the stability of a mutant protein withGROMACS User discussions mdrun , simulation-setup	6	371	September 10, 2024

Free energy calculation of two molecules in water

Related topics