Reconfiguring to a smaller domain

GROMACS version: 2023.3 (gmx_mpi mdrun) 2024.2 (gmx editconf)
GROMACS modification: No
Here post your question
I am simulating an intrinsically disordered protein IDP in water using gromacs 2023.3 using a large, triply periodic rhombic dodecahedron domain with explicit water solvent. My IDP has become much more compact, with length much smaller than my large domain. So to save computing resources, I’d like to avoid uselessly computing the behaviour of half a million water molecules.

My 3 attempts are described more below, but I have the following concerns :

Attempt 1 seems the most sensible but failed.

Attempt 2 seems terribly complicated, and perturbs my protein more than I like, but worked.

Attempt 3 is overly complicated because of a bug in the routine that writes *.gro files.

I solicit the advice of more experienced users. Is there are an existing solution that is better than my 3 attempts (perturbs the protein less and easier to implement) ?

PS I will write a separate post reporting the aforementioned bug.

Attempt 1 :

I used gmx editconf

(using gromacs 2024.2 on my laptop)

%gmx editconf -f last_output_big_md.gro -o new.gro -box 8 -bt dodecahedron

This did not behave as expected, keeping all solvant and ions from the original last_output_big_md.gro.
% wc last_output_big_md.gro
595469
% wc new.gro
595469

Attempt 2 :

I resorted to vi and edited the gro « by hand », removing all but the protein, editing line two (total atom numbers) and last line (domain description).

I used my old topol.top (created from the original protein PDB file, before solvating etc.) and was back in business, solvating anew, adding ions, minimizing energy and spinning up with constant volume and pressure ensembles. But this moved the protein around a bit and I fear a persnickety reviewer will take issue.

Attempt 3 :

I wrote a Python3.11 code that attempts to modify the original *.gro file (above « last_output_big_md.gro”) to contain only the molecules that fall within a user specified subdomain. This is painful because of the bug in the gromacs routine that creates these files, with column 3 spilling over and writing over top of column 2 in complicated ways.

Dear @D-Gly

Here are my two cents on how you could achieve what you want. However, if, and eventually to what extent, this procedure can impact your simulations is something I do not know and I can’t say. Finally, it will come down to you and the possible reviewers.

Your best friend here is gmx trjconv, which will give you an output that is already corrected in terms of components. The box will be wrong, but you can fix it by hand.

Now, the problem in my opinion is that if you just extract the protein and resolvate + reequilibrate you will perturb a lot the system, as you say. As such, I feel like the most reasonable way to go could be to extract the protein + everything nearby (say, 3nm) and put that in a new smaller box, and solvate only the shell between the new subsystem and the new box borders. The produce is simple in terms of tools but convoluted in terms of steps. Maybe there is something that does this much better, and in that case I apologize, but I really do not know of such tools.

So, I guess you will have water and/or ions nearby the protein or embedded in some volumes within the protein itself. As said before, I would not touch any of these at all. You can use gmx select to generate a group that contains everything you need nearby the protein, with something like

gmx select -f your_structure.gro -s your_structure.tpr -n your_index.ndx -on water_to_keep.ndx -select 'resname "SOL" and same residue as within 3 of group "Protein"'

The command is just an example, but the idea is that with this you generate an index file, water_to_keep.ndx, that will have the indexes all all the water atoms (SOL) that are within 3nm of your protein. You can do the same with ions. Then with gmx make_ndx you can generate an entry in an index file that contains Protein+water and ions that are within 3nm of you protein. Then, you use that this index file with gmx trjconv, which will produce a structure file that contains only the protein and the nearby water and ions. Take also note of the ions and water molecule numbers, as these will be the starting point for the new topology.

At this point you should have a structure (.gro possibly) file with protein and nearby ions and water molecules. Center it in zero with gmx editconf to make your life easier, also. Here is also where you want to change the box shape and volume. Now, how to embed it in a new box and fill the empty shell between the subsystem and the new box with water?

One way could be that you use gmx solvate to solvate your subsystem. The only thing here is that probably this procedure is going to add water molecules also in the subsystem itself, not only in the empty volumes. To avoid this you can change the radius of the water insertion (flag - scale). This might work but it will also put less water in the empty shell around the subsystem. Another way to go would be to solvate normally and then use the same procedure as before to remove all the additional water molecules. For example, you extract only the protein from your original system and generate with gmx solvate the box with the dimensions you want and use as input ONLY the protein. Then, you remove all the water molecules within 3nm of the protein, and you also remove the protein. This will leave you with a box of water in which a volume is missing, and that volume should be identical to the subsystem you generated before. Then you just literally paste the subsystem .gro file in the new water box (correct number of atoms etc, clearly), and at this point you have you starting system with the new volume.

Here you will have to generate a new topology, it should be easy by looking at the index file of the systems. And with this topology you will have to fix the charge, as you have extracted a subsystem with the ions within 3nm of the protein. This will leave you with i) a net charge most likely different from 0 and ii) if you had it, a concentration of salt that is wrong. If you added salt only to neutralize the system, then just fix the missing change with the corresponding amount of salt. If you had a specific concentration, do the proportions between old and new volume and calculate the new ion numbers (you have the old ones from the old topology). In both cases, remember that you already have some salt nearby the protein and you have to take that into account to ionize properly.

And now, the MD part. At this point I would really put quite a strong position restraint on all heavy atoms of the protein (i.e. non-hydrogen ones), and run a short NPT to thermalize the new water and relax the box side, say, a few ns. And then, IDEALLY, you can remove the restraints from the protein and continue the NPT run with a reduced system.

Again, this procedure will work at reducing your system size. Whether or not the procedure itself of reducing it is okay, that’s up to your judgment. I hope my way doesn’t perturb the system so much, as I avoid to touch ions and water in the vicinity of the protein, and I restrain it in the first relaxation of the new box.

I hope this helps!

Hello obZehn

I really appreciate your taking the time to write such a long and
detailed response. Your remark, that gmx solvate will probably add
molecules within the subsystem frightens me a bit, but I will experiment
and report on the results.

Kind regards,
Rob SCOTT (D-Gly)