Class WardLinkage
- java.lang.Object
-
- elki.clustering.hierarchical.linkage.WardLinkage
-
- All Implemented Interfaces:
GeometricLinkage,Linkage
@Reference(authors="J. H. Ward Jr.",title="Hierarchical grouping to optimize an objective function",booktitle="Journal of the American statistical association 58.301",url="https://doi.org/10.1080/01621459.1963.10500845",bibkey="doi:10.1080/01621459.1963.10500845") @Reference(authors="D. Wishart",title="256. Note: An Algorithm for Hierarchical Classifications",booktitle="Biometrics 25(1)",url="https://doi.org/10.2307/2528688",bibkey="doi:10.2307/2528688") @Alias({"ward","MISSQ"}) @Priority(101) public class WardLinkage extends java.lang.Object implements GeometricLinkage
Ward's method clustering method.This criterion minimizes the increase of squared errors, and should be used with squared Euclidean distance. Usually, ELKI will try to automatically square distances when you combine this with Euclidean distance. For performance reasons, the direct use of squared distances is preferable!
The distance of two clusters in this method is: \[ d_{\text{Ward}}(A,B):=\text{SSE}(A\cup B)-\text{SSE}(A)-\text{SSE}(B) \] where the sum of squared errors is defined as: \[ \text{SSE}(X):=\sum\nolimits_{x\in X} (x-\mu_X)^2 \qquad \text{with } \mu_X=\tfrac{1}{|X|}\sum\nolimits_{x\in X} X \] This objective can be rewritten to \[ d_{\text{Ward}}(A,B):=\tfrac{|A|\cdot|B|}{|A|+|B|} ||\mu_A-\mu_B||^2 = \tfrac{1}{1/|A|+1/|B|} ||\mu_A-\mu_B||^2 \]
For Lance-Williams, we can then obtain the following recursive definition: \[d_{\text{Ward}}(A\cup B,C)=\tfrac{|A|+|C|}{|A|+|B|+|C|} d(A,C) + \tfrac{|B|+|C|}{|A|+|B|+|C|} d(B,C) - \tfrac{|C|}{|A|+|B|+|C|} d(A,B)\]
These transformations rely on properties of the L2-norm, so they cannot be used with arbitrary metrics, unless they are equivalent to the L2-norm in some transformed space.
Because the resulting distances are squared, when used with a non-squared distance, ELKI implementations will apply the square root before returning the final result. This is statistically somewhat questionable, but usually yields more interpretable distances that — roughly — correspond to the increase in standard deviation. With ELKI, you can get both behavior: Either choose squared Euclidean distance, or regular Euclidean distance.
This method is also referred to as "minimize increase of sum of squares" (MISSQ) by Podani.
Reference:
J. H. Ward Jr.
Hierarchical grouping to optimize an objective function
Journal of the American statistical association 58.301The formulation using Lance-Williams equations is due to:
D. Wishart
256. Note: An Algorithm for Hierarchical Classifications
Biometrics 25(1)- Since:
- 0.6.0
- Author:
- Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classWardLinkage.ParClass parameterizer.
-
Field Summary
Fields Modifier and Type Field Description static WardLinkageSTATICStatic instance of class.
-
Constructor Summary
Constructors Constructor Description WardLinkage()Deprecated.use the static instanceSTATICinstead.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description doublecombine(int sizex, double dx, int sizey, double dy, int sizej, double dxy)Compute combined linkage for two clusters.doubledistance(double[] x, int sizex, double[] y, int sizey)Distance of two aggregated clusters.doubleinitial(double d, boolean issquare)Initialization of the distance matrix.double[]merge(double[] x, int sizex, double[] y, int sizey)Merge the aggregated vectors.doublerestore(double d, boolean issquare)Restore a distance to the original scale.
-
-
-
Field Detail
-
STATIC
public static final WardLinkage STATIC
Static instance of class.
-
-
Constructor Detail
-
WardLinkage
@Deprecated public WardLinkage()
Deprecated.use the static instanceSTATICinstead.Constructor.
-
-
Method Detail
-
initial
public double initial(double d, boolean issquare)Description copied from interface:LinkageInitialization of the distance matrix.
-
restore
public double restore(double d, boolean issquare)Description copied from interface:LinkageRestore a distance to the original scale.
-
combine
public double combine(int sizex, double dx, int sizey, double dy, int sizej, double dxy)Description copied from interface:LinkageCompute combined linkage for two clusters.- Specified by:
combinein interfaceLinkage- Parameters:
sizex- Size of first cluster x before mergingdx- Distance of cluster x to j before mergingsizey- Size of second cluster y before mergingdy- Distance of cluster y to j before mergingsizej- Size of candidate cluster jdxy- Distance between clusters x and y before merging- Returns:
- Combined distance
-
merge
public double[] merge(double[] x, int sizex, double[] y, int sizey)Description copied from interface:GeometricLinkageMerge the aggregated vectors.- Specified by:
mergein interfaceGeometricLinkage- Parameters:
x- Center of the first clustersizex- Weight of the first clustery- Center of the second clustersizey- Weight of the second cluster- Returns:
- Combined vector
-
distance
public double distance(double[] x, int sizex, double[] y, int sizey)Description copied from interface:GeometricLinkageDistance of two aggregated clusters.- Specified by:
distancein interfaceGeometricLinkage- Parameters:
x- Center of the first clustersizex- Weight of the first clustery- Center of the second clustersizey- Weight of the second cluster- Returns:
- Distance
-
-