Umbrella sampling for 1 lipid in CHARMM-GUI bilayer

GROMACS version: 2023.3 with CUDA 12.3
GROMACS modification: No

Dear all,

I am trying to do umbrella sampling on a particular lipid inside a lipid bilayer generated through CHARMM-GUI. However, as a relatively new GROMACS user, I am having difficulties implementing the process through a .mdp file, which I have been studying based on the Lemkul Lab’s GROMACS Tutorial.

Here are the questions I have:

  1. How do I specify a pull group, particularly the head group of one specific lipid? Would I select a subcomponent of the lipid during .ndx creation as opposed to the entire lipid, and how would I go about doing that?

  2. Assuming we have selected the lipid to pull in the bilayer but it is not at the position I need it to be, is it possible to measure the distance from the lipid to the center of the bilayer and move/pull/place it there before conducting umbrella sampling? i.e., I would like to move the lipid from the upper leaflet to the middle of the bilayer, then begin pulling upward.

  3. Does the following .mdp (modified from Dr. Lemkul’s) for this purpose look appropriate overall?

Here, ‘TEST_LIPID’ would be defined according to our first question:

title = Umbrella pulling simulation
define = -DPOSRES_B
; Run parameters
integrator = md
dt = 0.002
tinit = 0
nsteps = 250000 ; 500 ps
nstcomm = 10
; Output parameters
nstxout = 5000 ; every 10 ps
nstvout = 5000
nstfout = 500
nstxtcout = 500 ; every 1 ps
nstenergy = 500
; Bond parameters
constraint_algorithm = lincs
constraints = all-bonds
continuation = yes ; continuing from NPT
; Single-range cutoff scheme
cutoff-scheme = Verlet
nstlist = 20
ns_type = grid
rlist = 1.4
rcoulomb = 1.4
rvdw = 1.4
; PME electrostatics parameters
coulombtype = PME
fourierspacing = 0.12
fourier_nx = 0
fourier_ny = 0
fourier_nz = 0
pme_order = 4
ewald_rtol = 1e-5
optimize_fft = yes
; Berendsen temperature coupling is on in two groups
Tcoupl = Nose-Hoover
tc_grps = MEMB SOLV
tau_t = 1.0 1.0
ref_t = 310 310
; Pressure coupling is on
Pcoupl = Parrinello-Rahman
pcoupltype = semiisotropic
tau_p = 1.0
compressibility = 4.5e-5
ref_p = 1.0
refcoord_scaling = com
; Generate velocities is off
gen_vel = no
; Periodic boundary conditions are on in all directions
pbc = xyz
; Long-range dispersion correction
DispCorr = EnerPres
; Pull code
pull = yes
pull_ncoords = 1 ; only one reaction coordinate
pull_ngroups = 1 ; one group
pull_group1_name = TEST_LIPID
pull_coord1_type = umbrella ; harmonic potential
pull_coord1_geometry = distance ; simple distance increase
pull_coord1_dim = N N Y
pull_coord1_groups = 1 2
pull_coord1_start = yes ; define initial COM distance > 0
pull_coord1_rate = 0.01 ; 0.01 nm per ps = 10 nm per ns
pull_coord1_k = 1000 ; kJ mol^-1 nm^-2

Thank you in advance!

Dear @riczhu,

  1. Yes. You can take a look at which lipid specifically you want to sample, e.g. its residue number, and the specific atoms in the head group, e.g. the oxygen ones.

  2. yes, this would most likely be the distance between the centre of mass of the membrane and the centre of mass of the lipid head. For umbrella sampling, you need to define a series of windows along a collective variable, in your case a natural choice could be the distance between the latter centre of masses. You will have to steered the lipid along the CV and then extract the starting sampling windows from there as in Lemkul’s tutorial.

  3. In the .mdp file:
    i) what are you restraining with the -DPOSRES_B flag?
    ii) do you need forces, velocities and full precision positions (nstxout, nstvout, and nstfout)? This will generate very large .trr files, also because you are printing in output with high frequency.
    iii) regarding the pull part, you have 1 coordinate, which is going to be the distance between the COM of the lipid and the COM of the membrane, but two groups (pull_ngroups = 2) as you have two groups, that is, the two COMs that define the collective variable. In fact, later you say pull_coord1_groups = 1 2 which is correct, but as you write you have a second group of reference, so you will have to add something like pull_group2_name = membrane.

However, the rest of the mdp seems a bit off. What force field are you using? If you are using CHARMM-GUI to build the system, and as such you are using CHARMM36m as a force field, then most of the parameters are off. Take a look at what is suggested in GROMACS manual. Also, since you are using a modern GROMACS, I would switch to v-rescale and c-rescale as thermostat and barostat, respectively. And, in the barostat section, since you are using semiisotropic coupling (correctly, as you have a membrane), you should specify two values of compressibility and reference pressure, that is

compressibility = 4.5e-5   4.5e-5
ref_p = 1.0   1.0

Hope this helps!

1 Like

Dear @obZehn ,

Many thanks for your help. I am using CHARMM-GUI to build the system with the CHARMM36 force field, and have implemented your suggested changes in the following .mdp file. I have a few follow-up questions:

  1. I appreciate the note about large .trr files – are the output parameters (bolded+italicized in the following .mdp) you mentioned also necessary for generating the .tpr I will need for umbrella sampling, or can I leave them out?

  2. Would it be possible to do two umbrella sampling runs: one from current position to the membrane centre, and a second from current position to outside the membrane, and then combine the PMFs after using gmx wham?

  3. Are there more changes that would make my .mdp appropriate?

title = Umbrella pulling simulation
; Run parameters
integrator = md
dt = 0.002
tinit = 0
nsteps = 250000 ; 500 ps
nstcomm = 10
; Output parameters
nstxout = 5000 ; every 10 ps
nstvout = 5000
nstfout = 500
nstxtcout = 500 ; every 1 ps
nstenergy = 500
; Bond parameters
constraint_algorithm = lincs
constraints = h-bonds
cutoff-scheme = Verlet
vdwtype = cutoff
vdw-modifier = force-switch
rlist = 1.2
rvdw = 1.2
rvdw-switch = 1.0
coulombtype = PME
rcoulomb = 1.2
; no dispersion correction for bilayers
DispCorr = no
; v-rescale for T and c-rescale for P
Tcoupl = v-rescale
tc_grps = MEMB SOLV
tau_t = 1.0 1.0
ref_t = 310 310
; Pressure coupling is on
Pcoupl = c-rescale
pcoupltype = semiisotropic
tau_p = 1.0
compressibility = 4.5e-5 4.5e-5
ref_p = 1.0 1.0
refcoord_scaling = com
; Generate velocities is off
gen_vel = no
; Periodic boundary conditions are on in all directions
pbc = xyz
; Pull code
pull = yes
pull_ncoords = 1 ; only one reaction coordinate
pull_ngroups = 2; two groups define relative pull
pull_group1_name = TEST_LIPID
pull_group2_name = MEMB
pull_coord1_type = umbrella ; harmonic potential
pull_coord1_geometry = distance ; simple distance increase
pull_coord1_dim = N N Y
pull_coord1_groups = 1 2
pull_coord1_start = yes ; define initial COM distance > 0
pull_coord1_rate = 0.1 ; 0.1 nm per ps = 50 nm total
pull_coord1_k = 1000 ; kJ mol^-1 nm^-2

I sincerely appreciate all your suggestions!

Dear @riczhu,

i) So for the .tpr file you just need the files that you pass as input to grompp. What is contained in the .mdp file will only influence your output, both “physics-wise” (NVE? NVT? what temperature? What kind of electrostatics? etc.) and “data-wise” (how often do you print? what do you print? etc.)

For umbrella sampling (supposing you will use WHAM to combine the histograms) you will mainly need the files that are printed by the pull module, which is activated once you state pull = yes, and which are usually called *_pullx.xvg and *_pullf.xvg. You can control their output frequency with pull-nstxout and pull-nstfout mdp options. If you do not set them, then they print every 50 steps, which in my experience is more than enough.

Going back to what you bolded and italicized, I would just set

nstxout = 0
nstvout = 0
nstfout = 0

That is, suppress the output of forces, velocities, and positions. These flags, as you can see here, produce a large .trr binary file. I would never print that much in output, especially velocities and forces, except if I knew I need it. On the other hand, I would keep the flag nstxtcout = 500 to have an .xtc file, that is, a trajectory which stores only the positions. However, I would increase the value to 50000 (1 per ps) or even more if I didn’t need the trajectories for any analysis.

ii) I do not understand very well what do you mean. If you intend to sample several windows and then combine them together, e.g. with WHAM, then yes, the order in which you generate the windows doesn’t matter, in principle. However, there won’t be any dragging during the sampling. You drag the lipid along the reaction coordinate to sample from that trajectory the starting positions for your windows, but you will need several simulations (one per window) to fully sample the reaction coordinate, exactly as done in Lemkul’s tutorial that you cite! :)

iii) The .mdp file looks much better now for CHARMM36m. Regarding the rest, you can increase the tau_p to 5, which should be a good value for most atomistic simulations.

I’ll just close by adding a couple of considerations. First, a major point is that I would rerun previous simulations with these parameters to relax the system, as you can’t switch the electrostatics etc. mid run (e.g. run NVT/NPT with a certain value of DispCorr and then the production with another). Lastly, your pull settings are overall good in terms of meaning (e.g. pulling a lipid across a bilayer) however the speed pull_coord1_rate = 0.1 is huge, and you will most likely break the system or induce large perturbations. Keep in mind that umbrella sampling is an equilibrium technique, that is, you aim at sampling a window while it is at equilibrium with the surrounding. Inducing large perturbations during window generation will force you to have long relaxation times, and as such I would prefer slower pulling to generate the starting windows so that the system starts from a less perturbed condition within the windows. Here I can’t help you as I am not familiar with the challenges of sampling a lipid across a bilayer, but these are quite general rules for umbrella sampling/WHAM. Also keep in mind that a lipid will have quite a ton of conformational freedom, which you should fully sample if you want to have any kind of convergence in your free energy profile (the free energy profile is arguably completely meaningless without convergence). I would expect very long simulations times for your windows, in the order of hundreds of nanoseconds, given how slow are the dynamics of the lipids. Take a look at relevant literature and get some ideas on the simulation time, the number of windows, the convergence criteria etc before setting up simulations, as from experience I know that free energy estimations can be daunting and extremely time consuming, and as such is WAY better to have a rough idea from the beginning of the feasibility of such a task.

1 Like

Dear @obZehn ,

I am very grateful for your quick and detailed response.

I understand your explanations and yes, you were spot on with the second question :)

I will implement the changes you describe and follow your suggested plan to:

  1. rerun previous simulations with these parameters
  2. look into pull speed and likely reduce it from 0.1
  3. look at more relevant literature

Thank you again for all your help!

Glad to have helped! Happy sims :)