Class CentroidLinkage
- java.lang.Object
-
- elki.clustering.hierarchical.linkage.CentroidLinkage
-
- All Implemented Interfaces:
GeometricLinkage
,Linkage
@Reference(authors="J. C. Gower", title="A comparison of some methods of cluster analysis", booktitle="Biometrics (1967)", url="https://doi.org/10.2307/2528417", bibkey="doi:10.2307/2528417") @Alias({"centroid","upgmc"}) public class CentroidLinkage extends java.lang.Object implements GeometricLinkage
Centroid linkage — Unweighted Pair-Group Method using Centroids (UPGMC).This is closely related to
GroupAverageLinkage
(UPGMA), but the resulting distance corresponds to the distance of the cluster centroids when used with squared Euclidean distance.For Lance-Williams, we can then obtain the following recursive definition: \[d_{\text{UPGMC}}(A\cup B,C)=\tfrac{|A|}{|A|+|B|} d(A,C) + \tfrac{|B|}{|A|+|B|} d(B,C) - \tfrac{|A|\cdot|B|}{(|A|+|B|)^2} d(A,B)\]
With squared Euclidean distance, we then get the cluster distance: \[d_{\text{UPGMC}}(A,B)=||\tfrac{1}{|A|}\sum\nolimits_{a\in A} a, \tfrac{1}{|B|}\sum\nolimits_{b\in B} b||^2\] but for other distances, this will not generally be true.
Because the ELKI implementations use Lance-Williams, this linkage should only be used with (squared) Euclidean distance.
While titled "unweighted", this method does take cluster sizes into account when merging clusters with Lance-Williams.
While the idea of this method — at least for squared Euclidean — is compelling (distance of cluster centers), it is not as well behaved as one may think. It can yield so called "inversions", where a later merge has a smaller distance than an early merge, because a cluster center can be closer to a neighboring cluster than any of the individual points. Because of this, the
GroupAverageLinkage
(UPGMA) is usually preferable.Reference:
J. C. Gower
A comparison of some methods of cluster analysis
Biometrics (1967): 623-637.- Since:
- 0.6.0
- Author:
- Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
CentroidLinkage.Par
Class parameterizer.
-
Field Summary
Fields Modifier and Type Field Description static CentroidLinkage
STATIC
Static instance of class.
-
Constructor Summary
Constructors Constructor Description CentroidLinkage()
Deprecated.use the static instanceSTATIC
instead.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description double
combine(int sizex, double dx, int sizey, double dy, int sizej, double dxy)
Compute combined linkage for two clusters.double
distance(double[] x, int sizex, double[] y, int sizey)
Distance of two aggregated clusters.double[]
merge(double[] x, int sizex, double[] y, int sizey)
Merge the aggregated vectors.
-
-
-
Field Detail
-
STATIC
public static final CentroidLinkage STATIC
Static instance of class.
-
-
Constructor Detail
-
CentroidLinkage
@Deprecated public CentroidLinkage()
Deprecated.use the static instanceSTATIC
instead.Constructor.
-
-
Method Detail
-
combine
public double combine(int sizex, double dx, int sizey, double dy, int sizej, double dxy)
Description copied from interface:Linkage
Compute combined linkage for two clusters.- Specified by:
combine
in interfaceLinkage
- Parameters:
sizex
- Size of first cluster x before mergingdx
- Distance of cluster x to j before mergingsizey
- Size of second cluster y before mergingdy
- Distance of cluster y to j before mergingsizej
- Size of candidate cluster jdxy
- Distance between clusters x and y before merging- Returns:
- Combined distance
-
merge
public double[] merge(double[] x, int sizex, double[] y, int sizey)
Description copied from interface:GeometricLinkage
Merge the aggregated vectors.- Specified by:
merge
in interfaceGeometricLinkage
- Parameters:
x
- Center of the first clustersizex
- Weight of the first clustery
- Center of the second clustersizey
- Weight of the second cluster- Returns:
- Combined vector
-
distance
public double distance(double[] x, int sizex, double[] y, int sizey)
Description copied from interface:GeometricLinkage
Distance of two aggregated clusters.- Specified by:
distance
in interfaceGeometricLinkage
- Parameters:
x
- Center of the first clustersizex
- Weight of the first clustery
- Center of the second clustersizey
- Weight of the second cluster- Returns:
- Distance
-
-