Understanding the cut in gmx clustsize

GROMACS version: 2018
GROMACS modification: No
Welcome to a new world!

I am trying to use the gmx clustsize tool but can’t get around the basis for selecting the best largest distance (nm) to be considered in a cluster. I have command that looks as follows:

gmx_mpi clustsize -f …/polymer.nojump.mol.xtc -n polymer.ndx -s polymer.tpr -mol yes -cut 0.35 -pbc yes

I read on a page (https://www.researchgate.net/post/how_can_I_measure_the_cluster_size_between_center_of_mass) that one must measure the average size of your molecules before putting “-cut” value in your computation. Kindly advise on the procedure to select the best -cut value. Thank you!

Hi Teslim,
Welcome as well!

Maybe it by looking into how the clustering works in this case:

First, all atoms or molecules (depending on -mol) are assigned their own cluster, then all clusters are compared against all other clusters and if any two atoms in the two different clusters are closer than the -cut distance, they will be merged.

In this way, the input you should choose for this option depends on what you want to define as a cluster. It is reasonable to choose -cut to be roughly your molecule size, because an analysis with this setting will most closely reflect your intuitive understanding of what a cluster should be.

If -cut is very small, you will see as many clusters as molecules, because they will never get merged.

If -cut is much larger than the molecule size, you will see molecules that are more than a molecule size apart in the same cluster; which most of us intuitively will translate as them being too far apart.

1 Like

Thank you very much for the detailed response. Based on your reply, it is better to set the -cut to be roughly same as the molecule size. I am short of approach on how to determine the size of the molecule. Kindly help accordingly.

PS: In the preliminary phase of my analysis, I have calculated the end-to-end distance between the polymer backbone. Will that suffice as the molecule size?

Hi Teslim,

The end-to-end distance should be just fine.

Another strategy would be to run the algorithm a couple of times with different -cut settings to see how they affect your results.

Big thanks. The available end-to-end value was calculated using only one polymer chain (Nmol - 1). But the system that i wish to implement clustsize on has more than one polymer chain (Nmol ranges between 7 and 20). Will the end-to-end distance (avg values) calculated from the one polymer chain be okay to use as the cut?

Yes, that should be a good value. I suggest, if you can, visualize the clusters - you will literally see if the value you set makes sense.

Thank you very much