# Clustering

When studying nucleation it is often useful to use a clustering atoms to determine how many atoms are in the largest crystalline nucleus. The implementation of this approach in PLUMED is detailed in this paper. A typical input in that paper for calcluating the number of atoms in the largest cluster is shown below:

Click on the labels of the actions for more information on what each action computes
tested onv2.9
tested onmaster
# Ccalculate the coordination numbers of the atoms
lq: 
COORDINATIONNUMBER
Calculate the coordination numbers of atoms so that you can then calculate functions of the distribution of This action is a shortcut. More details
SPECIES
this keyword is used for colvars such as coordination number
=1-100
SWITCH
the switching function that it used in the construction of the contact matrix
={CUBIC D_0=0.45 D_MAX=0.55}
# Calculate the contact matrix for the atoms for which we calculated the coordinaion numbers cm:
CONTACT_MATRIX
Adjacency matrix in which two atoms are adjacent if they are within a certain cutoff. More details
GROUP
specifies the list of atoms that should be assumed indistinguishable
=lq
SWITCH
specify the switching function to use between two sets of indistinguishable atoms
={CUBIC D_0=0.45 D_MAX=0.55} # Do a clustering using the contact matrix above dfs:
DFSCLUSTERING
Find the connected components of the matrix using the depth first search clustering algorithm. More details
MATRIX
the input matrix (can use ARG instead)
=cm # Sum the coordination numbers for the atoms in the largest cluster clust1:
CLUSTER_PROPERTIES
Calculate properties of the distribution of some quantities that are part of a connected component This action is a shortcut. More details
CLUSTERS
the label of the action that does the clustering
=dfs
ARG
calculate the sum of the arguments calculated by this action for the cluster
=lq
CLUSTER
which cluster would you like to look at 1 is the largest cluster, 2 is the second largest, 3 is the the third largest and so on
=1
SUM
calculate the sum of all the quantities

This input is fine but it is also somewhat unweildy and a little confusing. The problem is that you have to calculate the coordination numbers of all the atoms in order to do the clustering and (unless you have a deep understanding of the way the code is implemented) it is not clear why. With the new sytax you can achieve the same result as follows:

Click on the labels of the actions for more information on what each action computes
tested onv2.9
tested onmaster
# Calculate the contact matrix.  This action computes a 100x100 matrix
cm: 
CONTACT_MATRIX
Adjacency matrix in which two atoms are adjacent if they are within a certain cutoff. More details
GROUP
specifies the list of atoms that should be assumed indistinguishable
=1-100
SWITCH
specify the switching function to use between two sets of indistinguishable atoms
={CUBIC D_0=0.45 D_MAX=0.55} # Do a clustering using the contact matrix that was computed above as input # This action returns a 100 dimensional vector. If element i of this matrix # is equal to 5 this means that atom i in the input to the contact matrix above # is part of the 5th largest cluster. dfs:
DFSCLUSTERING
Find the connected components of the matrix using the depth first search clustering algorithm. More details
ARG
the input for this action is the scalar output from one or more other actions
=cm # This next action returns a vector with 100 elements. If element i is equal to 1 then atom # i is part of the largest cluster. If it is equal to zero then it is part of some # other cluster. c1:
CLUSTER_WEIGHTS
Setup a vector that has one for all the atoms that form part of the cluster of interest and that has zero for all other atoms. More details
CLUSTERS
the label of the action that does the clustering
=dfs
CLUSTER
which cluster would you like to look at 1 is the largest cluster, 2 is the second largest, 3 is the the third largest and so on
=1 # Now calculate the coordination numbers using the usual matrix multiplication trick ones:
ONES
Create a constant vector with all elements equal to one This action is a shortcut. More details
SIZE
the number of ones that you would like to create
=100
coords:
MATRIX_VECTOR_PRODUCT
Calculate the product of the matrix and the vector More details
ARG
the input for this action is the scalar output from one or more other actions
=cm,ones # Multiply the coordination numbers by c1. We now have a vector where element i is equal to the # coordiation number of atom i if atom i is part of the largest cluster and zero otherwise. fcoords:
CUSTOM
Calculate a combination of variables using a custom expression. More details
ARG
the input to this function
=coords,c1
FUNC
the function you wish to evaluate
=x*y
PERIODIC
if the output of your function is periodic then you should specify the periodicity of the function
=NO # And lastly sum the coordination numbers of the atoms in the largest cluster coordsum:
SUM
Calculate the sum of the arguments More details
ARG
the input to this function
=fcoords
PERIODIC
if the output of your function is periodic then you should specify the periodicity of the function
=NO

This new syntax is much more clear as the clustering operation is performed on the CONTACT_MATRIX directly. The vector returned by the DFSCLUSTERING object then tells you which cluster each atom belongs to. You can thus use simple logical operations on this vector to determine the properties for all your clusters. Furthermore, you don’t even need to use the coordination numbers. If you simply want to calculate the number of atoms in the largest cluster you can use the following input:

Click on the labels of the actions for more information on what each action computes
tested onv2.9
tested onmaster
# Calculate the contact matrix.  This action computes a 100x100 matrix
cm: 
CONTACT_MATRIX
Adjacency matrix in which two atoms are adjacent if they are within a certain cutoff. More details
GROUP
specifies the list of atoms that should be assumed indistinguishable
=1-100
SWITCH
specify the switching function to use between two sets of indistinguishable atoms
={CUBIC D_0=0.45 D_MAX=0.55} # Do the clustering dfs:
DFSCLUSTERING
Find the connected components of the matrix using the depth first search clustering algorithm. More details
ARG
the input for this action is the scalar output from one or more other actions
=cm # Get a 100 element vector that has ones for those atoms that are part of the largest cluster c1:
CLUSTER_WEIGHTS
Setup a vector that has one for all the atoms that form part of the cluster of interest and that has zero for all other atoms. More details
CLUSTERS
the label of the action that does the clustering
=dfs
CLUSTER
which cluster would you like to look at 1 is the largest cluster, 2 is the second largest, 3 is the the third largest and so on
=1 # Sum the vector above to get the number of atoms in the largest cluster suml:
SUM
Calculate the sum of the arguments More details
ARG
the input to this function
=c1
PERIODIC
if the output of your function is periodic then you should specify the periodicity of the function
=NO