 • java.lang.Object
• All Implemented Interfaces:

@Reference(authors="J. C. Gower",
title="A comparison of some methods of cluster analysis",
booktitle="Biometrics (1967)",
url="https://doi.org/10.2307/2528417",
bibkey="doi:10.2307/2528417")
extends java.lang.Object
Centroid linkage — Unweighted Pair-Group Method using Centroids (UPGMC).

This is closely related to GroupAverageLinkage (UPGMA), but the resulting distance corresponds to the distance of the cluster centroids when used with squared Euclidean distance.

For Lance-Williams, we can then obtain the following recursive definition: $d_{\text{UPGMC}}(A\cup B,C)=\tfrac{|A|}{|A|+|B|} d(A,C) + \tfrac{|B|}{|A|+|B|} d(B,C) - \tfrac{|A|\cdot|B|}{(|A|+|B|)^2} d(A,B)$

With squared Euclidean distance, we then get the cluster distance: $d_{\text{UPGMC}}(A,B)=||\tfrac{1}{|A|}\sum\nolimits_{a\in A} a, \tfrac{1}{|B|}\sum\nolimits_{b\in B} b||^2$ but for other distances, this will not generally be true.

Because the ELKI implementations use Lance-Williams, this linkage should only be used with (squared) Euclidean distance.

While titled "unweighted", this method does take cluster sizes into account when merging clusters with Lance-Williams.

While the idea of this method — at least for squared Euclidean — is compelling (distance of cluster centers), it is not as well behaved as one may think. It can yield so called "inversions", where a later merge has a smaller distance than an early merge, because a cluster center can be closer to a neighboring cluster than any of the individual points. Because of this, the GroupAverageLinkage (UPGMA) is usually preferable.

Reference:

J. C. Gower
A comparison of some methods of cluster analysis
Biometrics (1967): 623-637.

Since:
0.6.0
Author:
Erich Schubert
• Nested Class Summary

Nested Classes
Modifier and Type Class and Description
Class parameterizer.
• Field Summary

Fields
Modifier and Type Field and Description
Static instance of class.
• Constructor Summary

Constructors
Constructor and Description
Deprecated.
use the static instance STATIC instead.
• Method Summary

All Methods
Modifier and Type Method and Description
double combine(int sizex, double dx, int sizey, double dy, int sizej, double dxy)
Compute combined linkage for two clusters.
• Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

initial, restore
• Field Detail

• STATIC

public static final CentroidLinkage STATIC
Static instance of class.
• Constructor Detail

@Deprecated
Deprecated. use the static instance STATIC instead.
Constructor.
• Method Detail

• combine

public double combine(int sizex,
double dx,
int sizey,
double dy,
int sizej,
double dxy)
Description copied from interface: Linkage
Compute combined linkage for two clusters.
Specified by:
combine in interface Linkage
Parameters:
sizex - Size of first cluster x before merging
dx - Distance of cluster x to j before merging
sizey - Size of second cluster y before merging
dy - Distance of cluster y to j before merging
sizej - Size of candidate cluster j
dxy - Distance between clusters x and y before merging
Returns:
Combined distance