Problem with Umbrella histograms

GROMACS version: 2021
GROMACS modification: Yes/No
Here post your question:

Dear all,

I simulated the permeation of a polymer through a membrane using the steered molecular dynamics and the umbrella sampling method to calculate the potential of mean force (PMF) referring to this process. Regarding the reaction coordinate, I conducted 180 simulations of 100 ns each with 0.1 nm distance intervals along a distance of 15 nm. I used the wham code to generate the PMF and the umbrella histograms, which are attached here, but it seems there are problems in these results. Although the PMF has a reasonable profile, the histogram shows a poor sampling. However, I do not understand why the sampling is poor, because I did select a high number of simulations with time intervals of 0.1 nm along the reaction coordinate. Could anyone help me to understand or suggest some modification? For the wham calculation, I used:

gmx wham -it tpr-files.dat -if pullf-files.dat -o -hist -unit kJ


Firstly, the histogram plot does only show the first of the histograms. You will see all if you use the xmgrace -nxy option, e.g., xmgrace -nxy histo.xvg.

Secondly, your number of histograms sounds reasonable as such. But you will also have to keep an eye on the pull force constant. I would expect that you need quite a high force constant, since you’ve got such a large slope in your PMF. Otherwise the pulled group (the polymer) will not remain in place close to the center of the umbrella. You might be able to see that from the full histogram plot.

Dear MagnusL,

Thank you for your help. Now, I can see all histograms in this plot.


I see that I need to improve the sampling around 4 nm. I’ll run simulation in this part of the reaction coordinate.

Concerning the pull force constant, I applied the value of 1000 kJ mol^-1 nm^-2, as I saw in other papers and in the GROMACS tutorial (US). I am not sure if this such a large slope in the PMF is reasonable, because the energy are high (both energy minima and energy barriers). For instance, the high energy barrier is about 1200 kJ mol⁻1, and it seems that this barrier is not realistic. Should I decrease this pull force constant?

If the polymer is fairly large and hydrophobic I’m afraid that the PMF might be quite reasonable in magnitude.

A pull force constant between 1000 and 5000 would probably be OK. How do you generate the initial configurations for the simulations? If they are generated using a moving umbrella potential, you might not get them evenly distributed, especially not if the PMF is challenging. That might be the reason why you have so many simulations at 2.5 nm and none at 4-4.5 nm It might be worth pulling using a constraint instead, when generating the initial configurations. But perhaps that advice is a bit too late.

You’ll probably want more sampling at ~6 nm as well.

If you believe that the PMF is not correct, it might be due to insufficient equilibration. Check how it is affected if you only include the last 50 (or 25) ns of each simulation, by using the -b option (-b 50000 and -b 75000) when running gmx wham.

I generated these initial configurations from a steered molecular dynamics simulation, where the polymer was pulled through the membrane. Then, I collected 180 frames where the COM (center of mass) distance between polymer and membrane was 0.1 nm along the reaction coordinate.

You explained that I should use a constraint intead. Should it be included to the polymer in the pulling simulation before the umbrella sampling? I’ll think how to define this constraint. Thank you!

Using constraints instead of umbrella is described at https://manual.gromacs.org/current/user-guide/mdp-options.html#mdp-pull-coord1-type. That way, you know that the distance between all 180 frames will be the same. Yes, you can only use it when you pull through to generate the starting configurations. You cannot use it for the actual umbrella sampling.

Thank you for your response. I tried to use constraints instead of umbrella, but it did not work. While the simulation to generate the starting configurations worked with pull_coord1_type = umbrella, the same simulation did not work with pull_coord1_type = constraint. This simulation stopped at the first step (step 0). Do I need to define other flag in the input file in order to use pull_coord1_type = constraint ?

It shouldn’t require anything special. You should probably use pull-coord1-start = yes and you might have to pull reasonably slow (but the pulling speed should not make the simulation crash at step 0). When using constraints the molecule must follow the pull reference coordinate exactly.

In this same simulation, I estimated the errors in the PMF with bootstrap analysis. However, the average energy values were NaN and the standard deviation values were 0. For this calculation, I used the line:

gmx wham -it tpr-files.dat -ix pullx-files.dat -o profile-boot-2.xvg -hist histo-boot-2.xvg -unit kJ -nBootstrap 100 -b 50000 -e 100000


The bsResult file shows NaN values:


@ title “Average and stddev from bootstrapping”
@ xaxis label “\xx\f{} (nm)”
@ yaxis label “E (kJ mol\S-1\N)”
@TYPE xy
@TYPE xydy
3.368214e-02 -nan 0.000000e+00
1.010301e-01 -nan 0.000000e+00
1.683781e-01 -nan 0.000000e+00

However, the PMF profile (profile-boot-2.xvg) does not have NaN values as we can see below:

@ title “Umbrella potential”
@ xaxis label “\xx\f{} (nm)”
@ yaxis label “E (kJ mol\S-1\N)”
@TYPE xy
3.368213e-02 0.000000e+00
1.010301e-01 1.201732e+01
1.683780e-01 1.429036e+01
2.357260e-01 2.999677e+01
3.030740e-01 3.744835e+01

Is there a problem with this bootstrap analysis? I solved the problem of overlap of histrogram, but now I got this problem.

Thank you very much for your help.

That looks strange. You could see if the -vbs option indicates what is going wrong.