RMSF Analysis of Protein & Ligand MD Simulation

GROMACS version: 2021.3
GROMACS modification: /No

Dear all,

I am new to GROMACS. I have carried out an MD simulation of a protein & ligand complex and I got confused about RMSF analysis. I have conveniently done RMSD analysis of my complex using my command prompt but I am unable to repeat the same calculation for the generation of RMSF plot, in fact, I am not able to figure out whether I did RMSF calculation for my complex or my only using the command. I see several articles making comparisons of RMSF results between their studied proteins and protein & ligand complexes but I could not figure that out, so I could not apply the same procedure to my research. I’ve searched on the web and on GROMACS forum about it and couldn’t find a proper solution.

Thank you in advance.

Dear Harunalcakan,
I’m not completely sure I understand your problem.
What do you mean by:

You can use the gmx rmsf command for each of your trajectories to get the RMSFs. There are a few different options of output files as well. You can get XVG files (suitable for plotting graphs) or as b-factors in PDB files (good if you want to visualize the results projected onto your protein structure). In these files you should be able to see which parts were included in the calculation.

Can you provide some more details about what you tried to do (paste the commands you ran) and the results your were expecting?

Dear Cathrine,

First of all I’m sorry if I couldn’t express my problem properly since I’m not a native English speaker.
What I tried to do is to calculate and to plot the RMSF of the protein and the protein-ligand complex separately, in order to compare the RMSF results between the complex and the protein (if this kind of investigation is possible). I had no problem with generating rmsf.xvg file and plotting a graph using the file. To be more clear, I’d like to share the command pathway that I followed:

Command line:
  gmx rmsf -f md_0_10.xtc -s md_0_10.tpr -o rmsf_residue.xvg -res

Reading file md_0_10.tpr, VERSION 2021.2 (single precision)
Reading file md_0_10.tpr, VERSION 2021.2 (single precision)
Select group(s) for root mean square calculation
Group     0 (         System) has 143213 elements
Group     1 (        Protein) has  9170 elements
Group     2 (      Protein-H) has  4624 elements
Group     3 (        C-alpha) has   581 elements
Group     4 (       Backbone) has  1743 elements
Group     5 (      MainChain) has  2323 elements
Group     6 (   MainChain+Cb) has  2893 elements
Group     7 (    MainChain+H) has  2882 elements
Group     8 (      SideChain) has  6288 elements
Group     9 (    SideChain-H) has  2301 elements
Group    10 (    Prot-Masses) has  9170 elements
Group    11 (    non-Protein) has 134043 elements
Group    12 (          Other) has    72 elements
Group    13 (            LIG) has    58 elements
Group    14 (            SOD) has    14 elements
Group    15 (          Water) has 133971 elements
Group    16 (            SOL) has 133971 elements
Group    17 (      non-Water) has  9242 elements
Select a group: 1
Selected 1: 'Protein'
Reading frame    2000 time 20000.000

GROMACS reminds you: "What if you're wrong about the great Ju Ju at the bottom of the sea?" (Richard Dawkins)

As could be seen, I chose Group 1 and generated an .xvg file to plot the graph. The .xtc file that I used belongs to the trajectory of protein & ligand complex and I do not have a trajectory file containing only protein.
I hope I could explain myself clearly.

No worries! If I understand you correctly, you want to compare the RMSFs of two systems: the apo protein, and the protein+ligand complex? But you also write you only have the trajectory of the protein+ligand complex? Then, I think your problem is that you need to generate the trajectory for the apo protein?

You can of course calculate the RMSFs for the protein and ligand separately or for the whole complex by selecting different groups in gmx rmsf (as you’ve already shown), but you will of course get the same results for the protein regardless of whether you calculate the RMSF with or without the ligand.

1 Like

Yes, that’s right! It’s also true that I only have the XTC file of the complex. But I’m not sure if I’m missing some points. As you stated, we can perform this calculation by selecting the Group 1 for the protein and the Group 13 for the ligand, but which groups do I need to select for the calculation involving the protein+ligand system? For example I can do RMSD calculation by selecting and combining two groups in the command prompt consecutively and I get RMSDs for complex system but I think it’s not possible for the RMSFs.

I think I got confused about this also. I see on several papers that researchers performed RMSFs for their proteins and protein+ligand complexes and it could be seen from their RMSF plots that they got different results. Actually, I’d like to show some examples about it but I’m not sure if I am able to share them here, I don’t want to break the forum rules if there are some :)

Since I am a beginner, these questions may sound weird but I need to clear my mind to go further.

The relevant comparison is between the RMSF of the protein only in each case, if you are interested in the impact of the ligand on the protein’s flexibility. As such, this requires no special use of gmx rmsf or contents of the trajectory. Just analyze the protein in each and you will have what you want. You do not want to consider the protein and ligand simultaneously because if you fit to dissimilar groups (e.g. the protein in the apo case and protein+ligand in the other), you have different points of reference.

I would also recommend performing RMSF analysis only on the Cα or backbone groups, not the whole protein. You will get a lot of irrelevant motions (like sidechains) if you consider everything.


Thank you very much.

Sorry for the confusion! Justin’s answer was a lot clearer! I meant that since you only have one trajectory with the protein+ligand complex and I got the impression you wanted to compare with and without ligand for this trajectory alone, and since RMSF is usually calculated per residue (Cα), you would just see the same RMSFs for your protein regardless of whether you looked at the protein alone or at the protein+ligand (given that the initial alignment was done in a consistent manner, as Justin pointed out).

As Justin wrote, you probably want to generate the apo trajectory and then calculate the RMSFs for the protein alone for each of these cases. I just didn’t want to directly assume this was what you wanted to do.

You got the right impression Cathrine! Thank you very much for your interest and your help. Justin and you made me clear.