Class GroupAverageLinkage

  • All Implemented Interfaces:
    Linkage

    @Reference(authors="R. R. Sokal, C. D. Michener",
               title="A statistical method for evaluating systematic relationship",
               booktitle="University of Kansas science bulletin 28",
               url="https://archive.org/details/cbarchive_33927_astatisticalmethodforevaluatin1902",
               bibkey="journals/kansas/SokalM1902")
    @Alias({"upgma","average","average-link","average-linkage","UPGMA"})
    @Priority(201)
    public class GroupAverageLinkage
    extends java.lang.Object
    implements Linkage
    Group-average linkage clustering method (UPGMA).

    This is a good default linkage to use with hierarchical clustering, as it neither exhibits the single-link chaining effect, nor has the strong tendency of complete linkage to split large clusters. It is also easy to understand, and it can be used with arbitrary distances and similarity functions.

    The distances of two clusters is defined as the between-group average distance of two points $a$ and $b$, one from each cluster. It should be noted that this is not the average distance within the resulting cluster, because it does not take within-cluster distances into account.

    The distance of two clusters in this method is: \[d_{\text{UPGMA}}(A,B)=\tfrac{1}{|A|\cdot|B|} \sum\nolimits_{a\in A}\sum\nolimits_{b\in B} d(a,b)\]

    For Lance-Williams, we can then obtain the following recursive definition: \[d_{\text{UPGMA}}(A\cup B,C)=\tfrac{|A|}{|A|+|B|} d(A,C) + \tfrac{|B|}{|A|+|B|} d(B,C)\]

    While the method is also called "Unweighted Pair Group Method with Arithmetic mean", it uses weights in the Lance-Williams formulation that account for the cluster size. It is unweighted in the sense that every point keeps the same weight, whereas in WeightedAverageLinkage (WPGMA), the weight of points effectively depends on the depth in the cluster tree.

    Reference:

    R. R. Sokal, C. D. Michener
    A statistical method for evaluating systematic relationship
    University of Kansas science bulletin, 28, 1409-1438.

    Since:
    0.6.0
    Author:
    Erich Schubert
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  GroupAverageLinkage.Par
      Class parameterizer.
    • Constructor Summary

      Constructors 
      Constructor Description
      GroupAverageLinkage()
      Deprecated.
      use the static instance STATIC instead.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      double combine​(int sizex, double dx, int sizey, double dy, int sizej, double dxy)
      Compute combined linkage for two clusters.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • GroupAverageLinkage

        @Deprecated
        public GroupAverageLinkage()
        Deprecated.
        use the static instance STATIC instead.
        Constructor.
    • Method Detail

      • combine

        public double combine​(int sizex,
                              double dx,
                              int sizey,
                              double dy,
                              int sizej,
                              double dxy)
        Description copied from interface: Linkage
        Compute combined linkage for two clusters.
        Specified by:
        combine in interface Linkage
        Parameters:
        sizex - Size of first cluster x before merging
        dx - Distance of cluster x to j before merging
        sizey - Size of second cluster y before merging
        dy - Distance of cluster y to j before merging
        sizej - Size of candidate cluster j
        dxy - Distance between clusters x and y before merging
        Returns:
        Combined distance