Umbrella sampling starting structure

GROMACS version: 2024
GROMACS modification: No
Here post your question :
I am a beginner doing my umbrella sampling, i have taken one protein protein complex of about 500 residues and did conventional MD simulation following the GROMACS lysozyme in water, after that, I want to do Umbrella sampling of the same protein complex. My question is, do I have to take non equilibrated original pdb for umbrella sampling or do I have to take MD equilibrated structure as starting structure for Umbrella sampling? also, do i have to do nvt, npt equilibration followed by production run to the system in a proper box dimension prior to pulling? dont i have to confirm the rmsd before i do pulling? i need advice on this.

It is definitely a good idea to start from an equilibrated system. If you really know what you are doing you could decide to skip, e.g., the NPT stage, but then you would almost certainly have to extend all umbrella simulations by approximately the same simulation time, since each simulation should be in equilibrium.

However, you don’t need to do any “production run” (in which you collect statistics) before pulling to generate the starting configurations for the umbrella sampling simulations.

The RMSD might be a good measurement to assess whether your system is stable (relative to the input configuration) or not. I wouldn’t say it’s necessary to analyse the RMSD before pulling, but in some cases it might be a good idea. It depends a bit on the quality of the input structure, the force field etc.

1 Like

Thank you. That clarifies alot. Also could you please help me out with deciding the pulling simulations time for each window? I’m having a protein complex of 650 residues and how do i consider how long should i do umbrella sampling on each window?

The number of windows is determined by the required force constant, which in turn depends on the shape of the PMF. You may have to do a trial-run to see that your histograms are overlapping well enough.

It is difficult to give any specific recommendations on the sampling time for each window. It depends on the equilibration time (in each umbrella window) and the pull force autocorrelation times in each window. Different windows might have different “optimal” sampling time. Regarding the equilibration time (the time to discard in the beginning of each umbrella window using gmx wham -b) you might find more information in Bumpy energy profile from PMF - #4 by MagnusL and the following posts. Apart from that, you can often see if there is a large bootstrap error in specific regions in your PMF. Those regions might require more sampling time.

Thank you again for the clarification. I’m still learning about this. Hence I’ve got so many questions. Could you please tell me how a umbrella potential should look like? My graphs I’ve attached here. How do you interpret from those samplings. I’m confused and unable to understand. I’ve sampled each window for 8 ns for 40 windows and it’s given me this outcome. Is this correct? Or I have to extend my sampling time. This was the trial run. The pmf profile looks like this.

If you could advise me/ guide me on this I’d be grateful. You’ve clarified my previous doubts. Thank you for your time.

Regards,
Satarupa

(attachments)

I would recommend that you have a look at the histograms you can get from gmx wham (if you plot with xmgrace you should open the file with -nxy.

The sampling profile in each window should be near bell-shaped and overlap well with their neighbouring windows. If there are gaps you might need more umbrellas/windows and/or modify the pull force constant. From your PMF I expect very poor sampling/overlap in the 0.5-0.65 nm region.

Thank you so much for clarifying my doubts. That helped alot. I’ll look into it.