Class BIRCHLloydKMeans
- java.lang.Object
-
- elki.clustering.hierarchical.birch.BIRCHLloydKMeans
-
- All Implemented Interfaces:
Algorithm
,ClusteringAlgorithm<Clustering<MeanModel>>
@Reference(authors="T. Zhang, R. Ramakrishnan, M. Livny",title="BIRCH: An Efficient Data Clustering Method for Very Large Databases",booktitle="Proc. 1996 ACM SIGMOD International Conference on Management of Data",url="https://doi.org/10.1145/233269.233324",bibkey="DBLP:conf/sigmod/ZhangRL96") @Reference(authors="T. Zhang, R. Ramakrishnan, M. Livny",title="BIRCH: A New Data Clustering Algorithm and Its Applications",booktitle="Data Min. Knowl. Discovery",url="https://doi.org/10.1023/A:1009783824328",bibkey="DBLP:journals/datamine/ZhangRL97") public class BIRCHLloydKMeans extends java.lang.Object implements ClusteringAlgorithm<Clustering<MeanModel>>
BIRCH-based clustering algorithm that simply treats the leafs of the CFTree as clusters.References:
T. Zhang, R. Ramakrishnan, M. Livny
BIRCH: An Efficient Data Clustering Method for Very Large Databases Proc. 1996 ACM SIGMOD International Conference on Management of DataT. Zhang, R. Ramakrishnan, M. Livny
BIRCH: A New Data Clustering Algorithm and Its Applications Data. Min. Knowl. Discovery- Since:
- 0.7.5
- Author:
- Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
BIRCHLloydKMeans.Par
Parameterization class.-
Nested classes/interfaces inherited from interface elki.Algorithm
Algorithm.Utils
-
-
Field Summary
Fields Modifier and Type Field Description (package private) CFTree.Factory
cffactory
CFTree factory.(package private) BIRCHKMeansPlusPlus
initialization
k-means++ initialization(package private) int
k
Number of cluster centers to initialize.private static Logging
LOG
Class logger.(package private) int
maxiter
Maximum number of iterations.
-
Constructor Summary
Constructors Constructor Description BIRCHLloydKMeans(CFTree.Factory cffactory, int k, int maxiter, BIRCHKMeansPlusPlus initialization)
Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private int
assignToNearestCluster(int[] assignment, double[][] means, double[][] cfmeans, ClusteringFeature[] cfs, int[] weights)
Assign each element to nearest cluster.private double[]
calculateVariances(int[] assignment, double[][] means, ClusteringFeature[] cfs, int[] weights)
Calculate variance of clusters based on clustering features.protected double
distance(double[] x, double[] y)
Compute a distance (and count the distance computations).protected double
distance(NumberVector x, double[] y)
Compute a distance (and count the distance computations).TypeInformation[]
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.private double[][]
kmeans(double[][] cfmeans, ClusteringFeature[] cfs, int[] assignment, int[] weights)
Perform k-means clustering.private double[][]
means(int[] assignment, double[][] means, ClusteringFeature[] cfs, int[] weights)
Calculate means of clusters.Clustering<KMeansModel>
run(Relation<NumberVector> relation)
Run the clustering algorithm.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface elki.clustering.ClusteringAlgorithm
autorun
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
Class logger.
-
cffactory
CFTree.Factory cffactory
CFTree factory.
-
k
int k
Number of cluster centers to initialize.
-
maxiter
int maxiter
Maximum number of iterations.
-
initialization
BIRCHKMeansPlusPlus initialization
k-means++ initialization
-
-
Constructor Detail
-
BIRCHLloydKMeans
public BIRCHLloydKMeans(CFTree.Factory cffactory, int k, int maxiter, BIRCHKMeansPlusPlus initialization)
Constructor.- Parameters:
cffactory
- CFTree factoryk
- Number of clustersmaxiter
- Maximum number of iterationsinitialization
- Initialization method for k-means
-
-
Method Detail
-
getInputTypeRestriction
public TypeInformation[] getInputTypeRestriction()
Description copied from interface:Algorithm
Get the input type restriction used for negotiating the data query.- Specified by:
getInputTypeRestriction
in interfaceAlgorithm
- Returns:
- Type restriction
-
run
public Clustering<KMeansModel> run(Relation<NumberVector> relation)
Run the clustering algorithm.- Parameters:
relation
- Input data- Returns:
- Clustering
-
kmeans
private double[][] kmeans(double[][] cfmeans, ClusteringFeature[] cfs, int[] assignment, int[] weights)
Perform k-means clustering.- Parameters:
cfmeans
- Cluster feature meanscfs
- Cluster featuresassignment
- Cluster assignment of each CFweights
- Cluster weight output- Returns:
- Cluster means
-
means
private double[][] means(int[] assignment, double[][] means, ClusteringFeature[] cfs, int[] weights)
Calculate means of clusters.- Parameters:
assignment
- Cluster assignmentmeans
- Means of clusterscfs
- Clustering features- Returns:
- Means of clusters.
-
assignToNearestCluster
private int assignToNearestCluster(int[] assignment, double[][] means, double[][] cfmeans, ClusteringFeature[] cfs, int[] weights)
Assign each element to nearest cluster.- Parameters:
assignment
- Current cluster assignmentmeans
- k-means cluster meanscfmeans
- Cluster feature meanscfs
- Cluster featuresweights
- Cluster weights (output)- Returns:
- Number of reassigned elements
-
distance
protected double distance(NumberVector x, double[] y)
Compute a distance (and count the distance computations).- Parameters:
x
- First objecty
- Second object- Returns:
- Distance
-
distance
protected double distance(double[] x, double[] y)
Compute a distance (and count the distance computations).- Parameters:
x
- First objecty
- Second object- Returns:
- Distance
-
calculateVariances
private double[] calculateVariances(int[] assignment, double[][] means, ClusteringFeature[] cfs, int[] weights)
Calculate variance of clusters based on clustering features.The result is only correct after updating the means!
- Parameters:
assignment
- Cluster assignment of CFsmeans
- Cluster meanscfs
- CF leavesweights
- Cluster weights- Returns:
- Per-cluster variances
-
-