# Clustering
When studying nucleation it is often useful to use a clustering atoms to determine how many atoms are in the largest crystalline nucleus. The implementation of this approach in PLUMED is detailed in this paper. A typical input in that paper for calcluating the number of atoms in the largest cluster is shown below:
# Ccalculate the coordination numbers of the atoms lq:COORDINATIONNUMBERCalculate the coordination numbers of atoms so that you can then calculate functions of the distribution of This action is a shortcut. More detailsSPECIES=1-100this keyword is used for colvars such as coordination numberSWITCH={CUBIC D_0=0.45 D_MAX=0.55} # Calculate the contact matrix for the atoms for which we calculated the coordinaion numbers cm:the switching function that it used in the construction of the contact matrixCONTACT_MATRIXAdjacency matrix in which two atoms are adjacent if they are within a certain cutoff. More detailsGROUP=lqspecifies the list of atoms that should be assumed indistinguishableSWITCH={CUBIC D_0=0.45 D_MAX=0.55} # Do a clustering using the contact matrix above dfs:specify the switching function to use between two sets of indistinguishable atomsDFSCLUSTERINGFind the connected components of the matrix using the depth first search clustering algorithm. More detailsMATRIX=cm # Sum the coordination numbers for the atoms in the largest cluster clust1:the input matrix (can use ARG instead)CLUSTER_PROPERTIESCalculate properties of the distribution of some quantities that are part of a connected component This action is a shortcut. More detailsCLUSTERS=dfsthe label of the action that does the clusteringARG=lqcalculate the sum of the arguments calculated by this action for the clusterCLUSTER=1which cluster would you like to look at 1 is the largest cluster, 2 is the second largest, 3 is the the third largest and so onSUMcalculate the sum of all the quantities
This input is fine but it is also somewhat unweildy and a little confusing. The problem is that you have to calculate the coordination numbers of all the atoms in order to do the clustering and (unless you have a deep understanding of the way the code is implemented) it is not clear why. With the new sytax you can achieve the same result as follows:
# Calculate the contact matrix. This action computes a 100x100 matrix cm:CONTACT_MATRIXAdjacency matrix in which two atoms are adjacent if they are within a certain cutoff. More detailsGROUP=1-100specifies the list of atoms that should be assumed indistinguishableSWITCH={CUBIC D_0=0.45 D_MAX=0.55} # Do a clustering using the contact matrix that was computed above as input # This action returns a 100 dimensional vector. If element i of this matrix # is equal to 5 this means that atom i in the input to the contact matrix above # is part of the 5th largest cluster. dfs:specify the switching function to use between two sets of indistinguishable atomsDFSCLUSTERINGFind the connected components of the matrix using the depth first search clustering algorithm. More detailsARG=cm # This next action returns a vector with 100 elements. If element i is equal to 1 then atom # i is part of the largest cluster. If it is equal to zero then it is part of some # other cluster. c1:the input for this action is the scalar output from one or more other actionsCLUSTER_WEIGHTSSetup a vector that has one for all the atoms that form part of the cluster of interest and that has zero for all other atoms. More detailsCLUSTERS=dfsthe label of the action that does the clusteringCLUSTER=1 # Now calculate the coordination numbers using the usual matrix multiplication trick ones:which cluster would you like to look at 1 is the largest cluster, 2 is the second largest, 3 is the the third largest and so onONESCreate a constant vector with all elements equal to one This action is a shortcut. More detailsSIZE=100 coords:the number of ones that you would like to createMATRIX_VECTOR_PRODUCTCalculate the product of the matrix and the vector More detailsARG=cm,ones # Multiply the coordination numbers by c1. We now have a vector where element i is equal to the # coordiation number of atom i if atom i is part of the largest cluster and zero otherwise. fcoords:the input for this action is the scalar output from one or more other actionsCUSTOMCalculate a combination of variables using a custom expression. More detailsARG=coords,c1the input to this functionFUNC=x*ythe function you wish to evaluatePERIODIC=NO # And lastly sum the coordination numbers of the atoms in the largest cluster coordsum:if the output of your function is periodic then you should specify the periodicity of the functionSUMCalculate the sum of the arguments More detailsARG=fcoordsthe input to this functionPERIODIC=NOif the output of your function is periodic then you should specify the periodicity of the function
This new syntax is much more clear as the clustering operation is performed on the CONTACT_MATRIX directly. The vector returned by the DFSCLUSTERING object then tells you which cluster each atom belongs to. You can thus use simple logical operations on this vector to determine the properties for all your clusters. Furthermore, you don’t even need to use the coordination numbers. If you simply want to calculate the number of atoms in the largest cluster you can use the following input:
# Calculate the contact matrix. This action computes a 100x100 matrix cm:CONTACT_MATRIXAdjacency matrix in which two atoms are adjacent if they are within a certain cutoff. More detailsGROUP=1-100specifies the list of atoms that should be assumed indistinguishableSWITCH={CUBIC D_0=0.45 D_MAX=0.55} # Do the clustering dfs:specify the switching function to use between two sets of indistinguishable atomsDFSCLUSTERINGFind the connected components of the matrix using the depth first search clustering algorithm. More detailsARG=cm # Get a 100 element vector that has ones for those atoms that are part of the largest cluster c1:the input for this action is the scalar output from one or more other actionsCLUSTER_WEIGHTSSetup a vector that has one for all the atoms that form part of the cluster of interest and that has zero for all other atoms. More detailsCLUSTERS=dfsthe label of the action that does the clusteringCLUSTER=1 # Sum the vector above to get the number of atoms in the largest cluster suml:which cluster would you like to look at 1 is the largest cluster, 2 is the second largest, 3 is the the third largest and so onSUMCalculate the sum of the arguments More detailsARG=c1the input to this functionPERIODIC=NOif the output of your function is periodic then you should specify the periodicity of the function