java.lang.Object
- elki.clustering.hierarchical.birch.BIRCHLloydKMeans

All Implemented Interfaces:: Algorithm, ClusteringAlgorithm<Clustering<MeanModel>>

@Reference(authors="T. Zhang, R. Ramakrishnan, M. Livny",title="BIRCH: An Efficient Data Clustering Method for Very Large Databases",booktitle="Proc. 1996 ACM SIGMOD International Conference on Management of Data",url="https://doi.org/10.1145/233269.233324",bibkey="DBLP:conf/sigmod/ZhangRL96") @Reference(authors="T. Zhang, R. Ramakrishnan, M. Livny",title="BIRCH: A New Data Clustering Algorithm and Its Applications",booktitle="Data Min. Knowl. Discovery",url="https://doi.org/10.1023/A:1009783824328",bibkey="DBLP:journals/datamine/ZhangRL97")
public class BIRCHLloydKMeans
extends java.lang.Object
implements ClusteringAlgorithm<Clustering<MeanModel>>

BIRCH-based clustering algorithm that simply treats the leafs of the CFTree as clusters.

References:

T. Zhang, R. Ramakrishnan, M. Livny
BIRCH: An Efficient Data Clustering Method for Very Large Databases Proc. 1996 ACM SIGMOD International Conference on Management of Data

T. Zhang, R. Ramakrishnan, M. Livny
BIRCH: A New Data Clustering Algorithm and Its Applications Data. Min. Knowl. Discovery

Since:: 0.7.5
Author:: Erich Schubert

Nested Class Summary

Nested Classes
Modifier and Type Class Description

static class BIRCHLloydKMeans.Par
Parameterization class.
- Nested classes/interfaces inherited from interface elki.Algorithm
  Algorithm.Utils

Field Summary

Fields
Modifier and Type	Field	Description
`(package private) CFTree.Factory`	`cffactory`	CFTree factory.
`(package private) BIRCHKMeansPlusPlus`	`initialization`	k-means++ initialization
`(package private) int`	`k`	Number of cluster centers to initialize.
`private static Logging`	`LOG`	Class logger.
`(package private) int`	`maxiter`	Maximum number of iterations.

Constructor Summary

Constructors
Constructor	Description
`BIRCHLloydKMeans(CFTree.Factory cffactory, int k, int maxiter, BIRCHKMeansPlusPlus initialization)`	Constructor.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`private int`	`assignToNearestCluster(int[] assignment, double[][] means, double[][] cfmeans, ClusteringFeature[] cfs, int[] weights)`	Assign each element to nearest cluster.
`private double[]`	`calculateVariances(int[] assignment, double[][] means, ClusteringFeature[] cfs, int[] weights)`	Calculate variance of clusters based on clustering features.
`protected double`	`distance(double[] x, double[] y)`	Compute a distance (and count the distance computations).
`protected double`	`distance(NumberVector x, double[] y)`	Compute a distance (and count the distance computations).
`TypeInformation[]`	`getInputTypeRestriction()`	Get the input type restriction used for negotiating the data query.
`private double[][]`	`kmeans(double[][] cfmeans, ClusteringFeature[] cfs, int[] assignment, int[] weights)`	Perform k-means clustering.
`private double[][]`	`means(int[] assignment, double[][] means, ClusteringFeature[] cfs, int[] weights)`	Calculate means of clusters.
`Clustering<KMeansModel>`	`run(Relation<NumberVector> relation)`	Run the clustering algorithm.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface elki.clustering.ClusteringAlgorithm
autorun

- Field Detail
  - LOG
```
private static final Logging LOG
```
    Class logger.
  - cffactory
```
CFTree.Factory cffactory
```
    CFTree factory.
  - k
```
int k
```
    Number of cluster centers to initialize.
  - maxiter
```
int maxiter
```
    Maximum number of iterations.
  - initialization
```
BIRCHKMeansPlusPlus initialization
```
    k-means++ initialization
- Constructor Detail
  - BIRCHLloydKMeans
```
public BIRCHLloydKMeans(CFTree.Factory cffactory,
                        int k,
                        int maxiter,
                        BIRCHKMeansPlusPlus initialization)
```
    Constructor.
    
    Parameters:
    
    cffactory - CFTree factory
    
    k - Number of clusters
    
    maxiter - Maximum number of iterations
    
    initialization - Initialization method for k-means
- Method Detail
  - getInputTypeRestriction
```
public TypeInformation[] getInputTypeRestriction()
```
    Description copied from interface: Algorithm
    
    Get the input type restriction used for negotiating the data query.
    
    Specified by:
    
    getInputTypeRestriction in interface Algorithm
    
    Returns:
    
    Type restriction
  - run
```
public Clustering<KMeansModel> run(Relation<NumberVector> relation)
```
    Run the clustering algorithm.
    
    Parameters:
    
    relation - Input data
    
    Returns:
    
    Clustering
  - kmeans
```
private double[][] kmeans(double[][] cfmeans,
                          ClusteringFeature[] cfs,
                          int[] assignment,
                          int[] weights)
```
    Perform k-means clustering.
    
    Parameters:
    
    cfmeans - Cluster feature means
    
    cfs - Cluster features
    
    assignment - Cluster assignment of each CF
    
    weights - Cluster weight output
    
    Returns:
    
    Cluster means
  - means
```
private double[][] means(int[] assignment,
                         double[][] means,
                         ClusteringFeature[] cfs,
                         int[] weights)
```
    Calculate means of clusters.
    
    Parameters:
    
    assignment - Cluster assignment
    
    means - Means of clusters
    
    cfs - Clustering features
    
    Returns:
    
    Means of clusters.
  - assignToNearestCluster
```
private int assignToNearestCluster(int[] assignment,
                                   double[][] means,
                                   double[][] cfmeans,
                                   ClusteringFeature[] cfs,
                                   int[] weights)
```
    Assign each element to nearest cluster.
    
    Parameters:
    
    assignment - Current cluster assignment
    
    means - k-means cluster means
    
    cfmeans - Cluster feature means
    
    cfs - Cluster features
    
    weights - Cluster weights (output)
    
    Returns:
    
    Number of reassigned elements
  - distance
```
protected double distance(NumberVector x,
                          double[] y)
```
    Compute a distance (and count the distance computations).
    
    Parameters:
    
    x - First object
    
    y - Second object
    
    Returns:
    
    Distance
  - distance
```
protected double distance(double[] x,
                          double[] y)
```
    Compute a distance (and count the distance computations).
    
    Parameters:
    
    x - First object
    
    y - Second object
    
    Returns:
    
    Distance
  - calculateVariances
```
private double[] calculateVariances(int[] assignment,
                                    double[][] means,
                                    ClusteringFeature[] cfs,
                                    int[] weights)
```
    Calculate variance of clusters based on clustering features.
    The result is only correct after updating the means!
    
    Parameters:
    
    assignment - Cluster assignment of CFs
    
    means - Cluster means
    
    cfs - CF leaves
    
    weights - Cluster weights
    
    Returns:
    
    Per-cluster variances

Class BIRCHLloydKMeans

Nested Class Summary

Nested classes/interfaces inherited from interface elki.Algorithm

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface elki.clustering.ClusteringAlgorithm

Field Detail

LOG

cffactory

k

maxiter

initialization

Constructor Detail

BIRCHLloydKMeans

Method Detail

getInputTypeRestriction

run

kmeans

means

assignToNearestCluster

distance

distance

calculateVariances