Choosing an appropriate RMSD cut off for gromos clustering

GROMACS version: 2020.4
GROMACS modification: No

Hi all,

I’ve seen examples of people using the gromos clustering method whereby they use a varying RMSD cut off value to create a cluster that represents over half the conformations in a trajectory. Essentially I don’t know what the best approach to choosing the RMSD cut off value is. Is it to use the tightest/smallest RMSD cut off that still yields a cluster with >50% conformations?

I’ve clustered using progressively tighter RMSD values whilst still having more than 50% and the central structures are identical, which makes me think this is probably a good way to approach this.

Isn’t this sort of the opposite of what clustering is intended to do? If you predetermine that you want 50% of the frames in a cluster, you can rig some value to give you that, but is the cluster truly a representation of similar structures, or is it a convenient outcome? In any case, the RMSD cutoff depends on the system; one would use a very different value for a globular protein with stable tertiary structure than what would be appropriate for an IDP.

There are better methods available than the GROMOS clustering method; there are shortcomings to this approach. It may be better to pursue those if it is not clear how to treat your system.

I see what you’re getting at. Maybe an alternative whilst continuing with gromos would be to use the average RMSD as a cutoff.

I will read around other clustering methods, I only used gromos initially since I came across it in literature.