• java.lang.Object
• All Implemented Interfaces:
GeometricLinkage, Linkage

@Reference(authors="J. C. Gower",
title="A comparison of some methods of cluster analysis",
booktitle="Biometrics (1967)",
url="https://doi.org/10.2307/2528417",
bibkey="doi:10.2307/2528417")
@Alias({"centroid","upgmc"})
extends java.lang.Object
implements GeometricLinkage
Centroid linkage — Unweighted Pair-Group Method using Centroids (UPGMC).

This is closely related to GroupAverageLinkage (UPGMA), but the resulting distance corresponds to the distance of the cluster centroids when used with squared Euclidean distance.

For Lance-Williams, we can then obtain the following recursive definition: $d_{\text{UPGMC}}(A\cup B,C)=\tfrac{|A|}{|A|+|B|} d(A,C) + \tfrac{|B|}{|A|+|B|} d(B,C) - \tfrac{|A|\cdot|B|}{(|A|+|B|)^2} d(A,B)$

With squared Euclidean distance, we then get the cluster distance: $d_{\text{UPGMC}}(A,B)=||\tfrac{1}{|A|}\sum\nolimits_{a\in A} a, \tfrac{1}{|B|}\sum\nolimits_{b\in B} b||^2$ but for other distances, this will not generally be true.

Because the ELKI implementations use Lance-Williams, this linkage should only be used with (squared) Euclidean distance.

While titled "unweighted", this method does take cluster sizes into account when merging clusters with Lance-Williams.

While the idea of this method — at least for squared Euclidean — is compelling (distance of cluster centers), it is not as well behaved as one may think. It can yield so called "inversions", where a later merge has a smaller distance than an early merge, because a cluster center can be closer to a neighboring cluster than any of the individual points. Because of this, the GroupAverageLinkage (UPGMA) is usually preferable.

Reference:

J. C. Gower
A comparison of some methods of cluster analysis
Biometrics (1967): 623-637.

Since:
0.6.0
Author:
Erich Schubert
• ### Nested Class Summary

Nested Classes
Modifier and Type Class Description
static class  CentroidLinkage.Par
Class parameterizer.
• ### Field Summary

Fields
Modifier and Type Field Description
static CentroidLinkage STATIC
Static instance of class.
• ### Constructor Summary

Constructors
Constructor Description
CentroidLinkage()
Deprecated.
use the static instance STATIC instead.
• ### Method Summary

All Methods
Modifier and Type Method Description
double combine​(int sizex, double dx, int sizey, double dy, int sizej, double dxy)
Compute combined linkage for two clusters.
double distance​(double[] x, int sizex, double[] y, int sizey)
Distance of two aggregated clusters.
double[] merge​(double[] x, int sizex, double[] y, int sizey)
Merge the aggregated vectors.
• ### Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

initial, restore
• ### Field Detail

• #### STATIC

public static final CentroidLinkage STATIC
Static instance of class.
• ### Constructor Detail

@Deprecated
public CentroidLinkage()
Deprecated.
use the static instance STATIC instead.
Constructor.
• ### Method Detail

• #### combine

public double combine​(int sizex,
double dx,
int sizey,
double dy,
int sizej,
double dxy)
Description copied from interface: Linkage
Compute combined linkage for two clusters.
Specified by:
combine in interface Linkage
Parameters:
sizex - Size of first cluster x before merging
dx - Distance of cluster x to j before merging
sizey - Size of second cluster y before merging
dy - Distance of cluster y to j before merging
sizej - Size of candidate cluster j
dxy - Distance between clusters x and y before merging
Returns:
Combined distance
• #### merge

public double[] merge​(double[] x,
int sizex,
double[] y,
int sizey)
Description copied from interface: GeometricLinkage
Merge the aggregated vectors.
Specified by:
merge in interface GeometricLinkage
Parameters:
x - Center of the first cluster
sizex - Weight of the first cluster
y - Center of the second cluster
sizey - Weight of the second cluster
Returns:
Combined vector
• #### distance

public double distance​(double[] x,
int sizex,
double[] y,
int sizey)
Description copied from interface: GeometricLinkage
Distance of two aggregated clusters.
Specified by:
distance in interface GeometricLinkage
Parameters:
x - Center of the first cluster
sizex - Weight of the first cluster
y - Center of the second cluster
sizey - Weight of the second cluster
Returns:
Distance