Class CentroidLinkage

  • All Implemented Interfaces:
    GeometricLinkage, Linkage

    @Reference(authors="J. C. Gower",
               title="A comparison of some methods of cluster analysis",
               booktitle="Biometrics (1967)",
               url="https://doi.org/10.2307/2528417",
               bibkey="doi:10.2307/2528417")
    @Alias({"centroid","upgmc"})
    public class CentroidLinkage
    extends java.lang.Object
    implements GeometricLinkage
    Centroid linkage — Unweighted Pair-Group Method using Centroids (UPGMC).

    This is closely related to GroupAverageLinkage (UPGMA), but the resulting distance corresponds to the distance of the cluster centroids when used with squared Euclidean distance.

    For Lance-Williams, we can then obtain the following recursive definition: \[d_{\text{UPGMC}}(A\cup B,C)=\tfrac{|A|}{|A|+|B|} d(A,C) + \tfrac{|B|}{|A|+|B|} d(B,C) - \tfrac{|A|\cdot|B|}{(|A|+|B|)^2} d(A,B)\]

    With squared Euclidean distance, we then get the cluster distance: \[d_{\text{UPGMC}}(A,B)=||\tfrac{1}{|A|}\sum\nolimits_{a\in A} a, \tfrac{1}{|B|}\sum\nolimits_{b\in B} b||^2\] but for other distances, this will not generally be true.

    Because the ELKI implementations use Lance-Williams, this linkage should only be used with (squared) Euclidean distance.

    While titled "unweighted", this method does take cluster sizes into account when merging clusters with Lance-Williams.

    While the idea of this method — at least for squared Euclidean — is compelling (distance of cluster centers), it is not as well behaved as one may think. It can yield so called "inversions", where a later merge has a smaller distance than an early merge, because a cluster center can be closer to a neighboring cluster than any of the individual points. Because of this, the GroupAverageLinkage (UPGMA) is usually preferable.

    Reference:

    J. C. Gower
    A comparison of some methods of cluster analysis
    Biometrics (1967): 623-637.

    Since:
    0.6.0
    Author:
    Erich Schubert
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  CentroidLinkage.Par
      Class parameterizer.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static CentroidLinkage STATIC
      Static instance of class.
    • Constructor Summary

      Constructors 
      Constructor Description
      CentroidLinkage()
      Deprecated.
      use the static instance STATIC instead.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      double combine​(int sizex, double dx, int sizey, double dy, int sizej, double dxy)
      Compute combined linkage for two clusters.
      double distance​(double[] x, int sizex, double[] y, int sizey)
      Distance of two aggregated clusters.
      double[] merge​(double[] x, int sizex, double[] y, int sizey)
      Merge the aggregated vectors.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • STATIC

        public static final CentroidLinkage STATIC
        Static instance of class.
    • Constructor Detail

      • CentroidLinkage

        @Deprecated
        public CentroidLinkage()
        Deprecated.
        use the static instance STATIC instead.
        Constructor.
    • Method Detail

      • combine

        public double combine​(int sizex,
                              double dx,
                              int sizey,
                              double dy,
                              int sizej,
                              double dxy)
        Description copied from interface: Linkage
        Compute combined linkage for two clusters.
        Specified by:
        combine in interface Linkage
        Parameters:
        sizex - Size of first cluster x before merging
        dx - Distance of cluster x to j before merging
        sizey - Size of second cluster y before merging
        dy - Distance of cluster y to j before merging
        sizej - Size of candidate cluster j
        dxy - Distance between clusters x and y before merging
        Returns:
        Combined distance
      • merge

        public double[] merge​(double[] x,
                              int sizex,
                              double[] y,
                              int sizey)
        Description copied from interface: GeometricLinkage
        Merge the aggregated vectors.
        Specified by:
        merge in interface GeometricLinkage
        Parameters:
        x - Center of the first cluster
        sizex - Weight of the first cluster
        y - Center of the second cluster
        sizey - Weight of the second cluster
        Returns:
        Combined vector
      • distance

        public double distance​(double[] x,
                               int sizex,
                               double[] y,
                               int sizey)
        Description copied from interface: GeometricLinkage
        Distance of two aggregated clusters.
        Specified by:
        distance in interface GeometricLinkage
        Parameters:
        x - Center of the first cluster
        sizex - Weight of the first cluster
        y - Center of the second cluster
        sizey - Weight of the second cluster
        Returns:
        Distance