Class CentroidLinkage
- java.lang.Object
-
- elki.clustering.hierarchical.linkage.CentroidLinkage
-
- All Implemented Interfaces:
GeometricLinkage,Linkage
@Reference(authors="J. C. Gower", title="A comparison of some methods of cluster analysis", booktitle="Biometrics (1967)", url="https://doi.org/10.2307/2528417", bibkey="doi:10.2307/2528417") @Alias({"centroid","upgmc"}) public class CentroidLinkage extends java.lang.Object implements GeometricLinkage
Centroid linkage — Unweighted Pair-Group Method using Centroids (UPGMC).This is closely related to
GroupAverageLinkage(UPGMA), but the resulting distance corresponds to the distance of the cluster centroids when used with squared Euclidean distance.For Lance-Williams, we can then obtain the following recursive definition: \[d_{\text{UPGMC}}(A\cup B,C)=\tfrac{|A|}{|A|+|B|} d(A,C) + \tfrac{|B|}{|A|+|B|} d(B,C) - \tfrac{|A|\cdot|B|}{(|A|+|B|)^2} d(A,B)\]
With squared Euclidean distance, we then get the cluster distance: \[d_{\text{UPGMC}}(A,B)=||\tfrac{1}{|A|}\sum\nolimits_{a\in A} a, \tfrac{1}{|B|}\sum\nolimits_{b\in B} b||^2\] but for other distances, this will not generally be true.
Because the ELKI implementations use Lance-Williams, this linkage should only be used with (squared) Euclidean distance.
While titled "unweighted", this method does take cluster sizes into account when merging clusters with Lance-Williams.
While the idea of this method — at least for squared Euclidean — is compelling (distance of cluster centers), it is not as well behaved as one may think. It can yield so called "inversions", where a later merge has a smaller distance than an early merge, because a cluster center can be closer to a neighboring cluster than any of the individual points. Because of this, the
GroupAverageLinkage(UPGMA) is usually preferable.Reference:
J. C. Gower
A comparison of some methods of cluster analysis
Biometrics (1967): 623-637.- Since:
- 0.6.0
- Author:
- Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classCentroidLinkage.ParClass parameterizer.
-
Field Summary
Fields Modifier and Type Field Description static CentroidLinkageSTATICStatic instance of class.
-
Constructor Summary
Constructors Constructor Description CentroidLinkage()Deprecated.use the static instanceSTATICinstead.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description doublecombine(int sizex, double dx, int sizey, double dy, int sizej, double dxy)Compute combined linkage for two clusters.doubledistance(double[] x, int sizex, double[] y, int sizey)Distance of two aggregated clusters.double[]merge(double[] x, int sizex, double[] y, int sizey)Merge the aggregated vectors.
-
-
-
Field Detail
-
STATIC
public static final CentroidLinkage STATIC
Static instance of class.
-
-
Constructor Detail
-
CentroidLinkage
@Deprecated public CentroidLinkage()
Deprecated.use the static instanceSTATICinstead.Constructor.
-
-
Method Detail
-
combine
public double combine(int sizex, double dx, int sizey, double dy, int sizej, double dxy)Description copied from interface:LinkageCompute combined linkage for two clusters.- Specified by:
combinein interfaceLinkage- Parameters:
sizex- Size of first cluster x before mergingdx- Distance of cluster x to j before mergingsizey- Size of second cluster y before mergingdy- Distance of cluster y to j before mergingsizej- Size of candidate cluster jdxy- Distance between clusters x and y before merging- Returns:
- Combined distance
-
merge
public double[] merge(double[] x, int sizex, double[] y, int sizey)Description copied from interface:GeometricLinkageMerge the aggregated vectors.- Specified by:
mergein interfaceGeometricLinkage- Parameters:
x- Center of the first clustersizex- Weight of the first clustery- Center of the second clustersizey- Weight of the second cluster- Returns:
- Combined vector
-
distance
public double distance(double[] x, int sizex, double[] y, int sizey)Description copied from interface:GeometricLinkageDistance of two aggregated clusters.- Specified by:
distancein interfaceGeometricLinkage- Parameters:
x- Center of the first clustersizex- Weight of the first clustery- Center of the second clustersizey- Weight of the second cluster- Returns:
- Distance
-
-