Gromacs with protein structure refinement

GROMACS version: 2023
GROMACS modification: No
So pretty recently I ran my first what could be consider long simulation of about 20ns, the reason for this is becasue I got into hommolgy modelling and modeled a somewhat good quality protein, well at less by the tools on this websites standards https://saves.mbi.ucla.edu/?job=1409936 My Ramachandran plot: 94.9% core 4.8% allow 0.1% gener 0.1% disall and Verify3D score percentage of 78.08% and a ERRAT score of 84.1978. But after asking my good two friends chatGPT and Microsoft Bing I found out I could use molcular dynamics and energy minimization to further refine the model to make it even better. So I ran a Normal simulation on it after neurtallizing the system, first the energy minimization, then the equilbrum, and finally the production of the MD simulation (that took forever). Once it was done I opened it up in PyMol and notice the the structure looked very weird compard to when I opened the structure pre simulation but I went ahead and but it through the validation tools metion earlier and got a ERRAT score of 85.1282 and Verify3D percentage of 80.42% (which is a pass by this tools standards) but my Ramachandran plot went down 0.0% core 0.0% allow 0.0% gener 100.0% disall and after visualizing the trajectory in VMD I saw that my protein’s shape was changing a lot, it’s original shape and post simulation shape where visibly different, not at all the expected strcuture. I just have to ask what happend? Why did it’s shape change some much and how do I fix it? Should I just make the simulation longer? If so how long? And is there a way to make it quicker? Or something else? If you need more insight on the simulation details. I used the AMBERSB99-ILDN force filed, and the TIP3P water model, and the .mdp files used were the ones used in the lysozoyme in water tutorial with the only real change being the extended time frame from 1ns to 20ns in the md.mdp file.

Possible failure mode to consider:

  • Box dimension insufficient: the “tail” could cross over the periodic boundary and interact with the rest of the protein across the boundary improperly
  • Consider protonation state (c.f. PROPKA)

Things to check:

  • Did you ignore warnings using -maxwarn ? If so, don’t, and address them. If the conformation change problem is global, this could point to a parameter problem: double check how you setup your simulation.
  • Look at the trajectory: if you have sudden change from initial config to 1st frame, this is likely a minimization/equilibration problem. Also look at the equilibration trajectory.
  • Open the log, and search for LINCS warning during the run (equilibration and production).
  • Look at the temperature, pressure graph over time. Look at the unit cell dimension over time. Are they behaving as expected?
  • If the change is progressive, align the trajectory to the initial structure and look for the area where the conformation change starts: this is a clue as to where the problem originates.

Regarding speed, if you have a GPU, be sure that your simulation is using it (see message in log and on the terminal regarding GPU tasks.)

If what you mean by “tail” as in the Myosin tail. Don’t worry about it my structure only contains 853 residues which make up the the lever arm or “neck” and the motor domain because that’s where all the magic happens

And after googling what a protonation state is, I didn’t think pH would be at play because the solvent in the box is water which usually has a neutral pH

As for warings I didn’t recieve any.

results for EM:
Steepest Descents converged to Fmax < 1000 in 3109 steps
Potential Energy = -1.0033458e+07
Maximum force = 8.3653857e+02 on atom 4745
Norm of force = 7.1612146e+00

As for the equilibrium, yeah everything went as expected. Throughout the NVT and NPT equilibrium I got a still unmoving protein. And graphing the tempertures and presseure nothing to note, everthing when according to plan. Unless this is weird

I didn’t get any LINCS warings either, not in the equilibrium or production

As for the “align the trajectory to the initial structure and look for the area where the conformation change starts” I’m not really sure how to do that. I can really say is after visualizing the first structure (pre simulation) and comparing it to the structure post simualtion the neck looks more deformed, kind of like crumpled. And starts to striaghten out, like myosin kind looks like a golfclub but as time goes on it looks more like a bat, and the neck kind of looks like it’s seperating from the the motor domain but is stopped by the amino acids that bind them together which I think is part the backbone I think it’s called.

This rules out a grossly non-minimized protein (exploding system). If the protein doesn’t grossly denature, then the “100% disall” on the ramachandran plot is likely an input format mismatch for that tool - might want to fiddle around to extract only the protein, or use gmx rama to extract dihedral angle, and then plot them by yourself.

It is expected that equilibration steps result in an unmoving protein, because you have restraints on the position of the atoms. Longer equilibration in steps with progressively lower restraint can sometime help.

Regarding alignement, try something like this. Load an initial structure and align your trajectory to it. Then, by playing the trajectory, you can easily identify area of movement compared to the initial structure, without being affected by translation/rotation.

It may be that your simulation has a more subtle problem (domain knowledge about it will help - I have none on myosin). Protonation is more complex than the pH of the solution: given a protonable residue (Glu, Asp, His), the local environement in which it is found (surrounding residue → surrounding partial charges) can tilt the balance toward othe protonated and deprotonated state. This can affect the dynamics. It is also possible that the conformation change is legitimate.

I already used PyMol to extract only the protein form my system and added the chain information back and even try removing the hydrogens, with no luck. But I will keep trying, but as regards to the longer equilibration with progressively lower restaints how do I implement it into a .mdp file?

Regarding the alignment, it was as I thought, it started in the 853 residue and made it up to the 711 residue which make up the lever arm, when I align these residues I got an RMSD of about 18 and when I align the entire structure I get 12 just about.

And when you say domain knowlegde do you mean deep indept knowledge on on how the protein works? Like everything to it’s function to how the amino acids within it work and how they affect structure and function? And for Protonation I still don’t know much about it, so could you perhaps you could link some resources to learn more about it and how it relates to GROMACS simulations?

And finally if this is a legititmate change what is it suppose to representent? Is it my protein becoming more unstable over time? Or is it going through conformational changes and will eventually return to which I thought was it’s lowest configuration state? Or is my life a lie and this is the lowest energy configuration state?

Regarding the technical questions:

For controllable restraint forces, open posre.itp, and search and replace all instances of 1000 1000 1000 with F_RESTR F_RESTR F_RESTR. Then in the mdp file, modify your define setting line like so:

define                  =  -DF_RESTR=500

You set the restraint force by changing the number (unit kj/mol/nm), and can try additional, lower restraint force equilibration before going to production.

Regarding protonation, you can find more details on tools for pKa determination and how to switch protonation state with pdb2gmx here.

Regarding the more general question:

Both homology modelling and MD are not perfect, and cannot be trusted to return 100% correct results. Hence, to be able to evaluate critically your results, you need to know facts about the system of interest, e.g. when one is interest in property X of a protein, it is a good practice to validate that the simulation reproduces independant properties Y and Z (that have been verified experimentally) to confirm the simulation is not totally off.

If you knew that myosin without ATP or ADP should have a specific configuration of the lever arm, and your homology model/simulation does not show that, then that is ground for trying to find what is causing the mismatch (force field, protonation state, lack of sufficient equilibration or runtime…) If you have a mutant, then checking if the wild type exhibits the same behaviour in simulation can also be a good point of reference.

If you have no prior knowledge on what to expect, it will be difficult for you to be sure your results are correct, because there are subtleties, like the protonation state settings, which you do not know exist, so cannot get right.

So it comes down to what your goal is. If you wanted a practical example to start getting into MD, then continue what you are doing, search on how to analyze the trajectory data you have, or how to refine your simulation protocol, but keeping in mind that the actual results are likely to be incorrect. If you really are interested in the way this particular myosin mutant behaves, with the view to answer a biological question, yet you are just starting practicing MD and do not have knowledge to anchor the simulation to experimental data, then this is not a recipe for good results.

I see, thank you for the insight.