Understanding the cut in gmx clustsize

GROMACS version: 2018
GROMACS modification: No
Welcome to a new world!

I am trying to use the gmx clustsize tool but can’t get around the basis for selecting the best largest distance (nm) to be considered in a cluster. I have command that looks as follows:

gmx_mpi clustsize -f …/polymer.nojump.mol.xtc -n polymer.ndx -s polymer.tpr -mol yes -cut 0.35 -pbc yes

I read on a page (https://www.researchgate.net/post/how_can_I_measure_the_cluster_size_between_center_of_mass) that one must measure the average size of your molecules before putting “-cut” value in your computation. Kindly advise on the procedure to select the best -cut value. Thank you!

Hi Teslim,
Welcome as well!

Maybe it by looking into how the clustering works in this case:

First, all atoms or molecules (depending on -mol) are assigned their own cluster, then all clusters are compared against all other clusters and if any two atoms in the two different clusters are closer than the -cut distance, they will be merged.

In this way, the input you should choose for this option depends on what you want to define as a cluster. It is reasonable to choose -cut to be roughly your molecule size, because an analysis with this setting will most closely reflect your intuitive understanding of what a cluster should be.

If -cut is very small, you will see as many clusters as molecules, because they will never get merged.

If -cut is much larger than the molecule size, you will see molecules that are more than a molecule size apart in the same cluster; which most of us intuitively will translate as them being too far apart.

Thank you very much for the detailed response. Based on your reply, it is better to set the -cut to be roughly same as the molecule size. I am short of approach on how to determine the size of the molecule. Kindly help accordingly.

PS: In the preliminary phase of my analysis, I have calculated the end-to-end distance between the polymer backbone. Will that suffice as the molecule size?

Hi Teslim,

The end-to-end distance should be just fine.

Another strategy would be to run the algorithm a couple of times with different -cut settings to see how they affect your results.

Big thanks. The available end-to-end value was calculated using only one polymer chain (Nmol - 1). But the system that i wish to implement clustsize on has more than one polymer chain (Nmol ranges between 7 and 20). Will the end-to-end distance (avg values) calculated from the one polymer chain be okay to use as the cut?

Yes, that should be a good value. I suggest, if you can, visualize the clusters - you will literally see if the value you set makes sense.

Thank you very much