GROMACS version: 2020.4
GROMACS modification: No
I’m trying to use gmx cluster
to find the centroid of the biggest cluster and use it as the reference structure for RMSD calculations. I wanted to see if there is a transition in the RMSD value at the end of the simulation to justify the length of a long simulation (if the RMSD has reached an equilibrium).
Below are the steps of my method:
- Use
gmx cluster
to perform clustering analysis. For my protein, with a cutoff distance of 0.2 nm, 31 clusters were identified. The biggest cluster had 5782 members (72% of the total number of frames ) and its medoid was the configuration at 1412.25 ns (as shown incluster.log
). - Copy the first PDB frame from
clusters.pdb
and save it asprotein_cluster_medoid.pdb
. I assumed that the first PDB frame should correspond to the medoid of the biggest cluster (please let me know if this is wrong). - Use
gmx rms
to take inprotein_cluster_medoid.pdb
as the reference structure and calculate the RMSD value.
As a result, there was no transition shown in the RMSD values, but it was shown that the RMSD value could be up to 3.5 nm, which is pretty large. I assumed that the large values were due to the fact that there were a fair amount of structures from other clusters. (Although I still think that 3.5 nm is still too large.) I further plotted the distribution of the RMSD values and I was then confused. I thought that the distribution should be left-skewed and at least 72% of the data should be below 0.2 nm, but this is apparently not the case shown by the figure. Regarding this method, I’m wondering if there is something I misunderstood.
Specifically, my questions can be summarized as follows:
- What is the reason for having such large RMSD values when using the medoid of the largest cluster as the reference?
- Why the histogram showed that the majority of the samples had RMSD more than 2.0 nm? Ideally, I thought that at least 72% of the data should be below 0.2 nm.
- Is there another way to justify the simulation length except for looking at the RMSD value?