Class BIRCHLloydKMeans
- java.lang.Object
-
- elki.clustering.hierarchical.birch.BIRCHLloydKMeans
-
- All Implemented Interfaces:
Algorithm,ClusteringAlgorithm<Clustering<MeanModel>>
@Reference(authors="T. Zhang, R. Ramakrishnan, M. Livny",title="BIRCH: An Efficient Data Clustering Method for Very Large Databases",booktitle="Proc. 1996 ACM SIGMOD International Conference on Management of Data",url="https://doi.org/10.1145/233269.233324",bibkey="DBLP:conf/sigmod/ZhangRL96") @Reference(authors="T. Zhang, R. Ramakrishnan, M. Livny",title="BIRCH: A New Data Clustering Algorithm and Its Applications",booktitle="Data Min. Knowl. Discovery",url="https://doi.org/10.1023/A:1009783824328",bibkey="DBLP:journals/datamine/ZhangRL97") public class BIRCHLloydKMeans extends java.lang.Object implements ClusteringAlgorithm<Clustering<MeanModel>>
BIRCH-based clustering algorithm that simply treats the leafs of the CFTree as clusters.References:
T. Zhang, R. Ramakrishnan, M. Livny
BIRCH: An Efficient Data Clustering Method for Very Large Databases Proc. 1996 ACM SIGMOD International Conference on Management of DataT. Zhang, R. Ramakrishnan, M. Livny
BIRCH: A New Data Clustering Algorithm and Its Applications Data. Min. Knowl. Discovery- Since:
- 0.7.5
- Author:
- Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classBIRCHLloydKMeans.ParParameterization class.-
Nested classes/interfaces inherited from interface elki.Algorithm
Algorithm.Utils
-
-
Field Summary
Fields Modifier and Type Field Description (package private) CFTree.FactorycffactoryCFTree factory.(package private) BIRCHKMeansPlusPlusinitializationk-means++ initialization(package private) intkNumber of cluster centers to initialize.private static LoggingLOGClass logger.(package private) intmaxiterMaximum number of iterations.
-
Constructor Summary
Constructors Constructor Description BIRCHLloydKMeans(CFTree.Factory cffactory, int k, int maxiter, BIRCHKMeansPlusPlus initialization)Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private intassignToNearestCluster(int[] assignment, double[][] means, double[][] cfmeans, ClusteringFeature[] cfs, int[] weights)Assign each element to nearest cluster.private double[]calculateVariances(int[] assignment, double[][] means, ClusteringFeature[] cfs, int[] weights)Calculate variance of clusters based on clustering features.protected doubledistance(double[] x, double[] y)Compute a distance (and count the distance computations).protected doubledistance(NumberVector x, double[] y)Compute a distance (and count the distance computations).TypeInformation[]getInputTypeRestriction()Get the input type restriction used for negotiating the data query.private double[][]kmeans(double[][] cfmeans, ClusteringFeature[] cfs, int[] assignment, int[] weights)Perform k-means clustering.private double[][]means(int[] assignment, double[][] means, ClusteringFeature[] cfs, int[] weights)Calculate means of clusters.Clustering<KMeansModel>run(Relation<NumberVector> relation)Run the clustering algorithm.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface elki.clustering.ClusteringAlgorithm
autorun
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
Class logger.
-
cffactory
CFTree.Factory cffactory
CFTree factory.
-
k
int k
Number of cluster centers to initialize.
-
maxiter
int maxiter
Maximum number of iterations.
-
initialization
BIRCHKMeansPlusPlus initialization
k-means++ initialization
-
-
Constructor Detail
-
BIRCHLloydKMeans
public BIRCHLloydKMeans(CFTree.Factory cffactory, int k, int maxiter, BIRCHKMeansPlusPlus initialization)
Constructor.- Parameters:
cffactory- CFTree factoryk- Number of clustersmaxiter- Maximum number of iterationsinitialization- Initialization method for k-means
-
-
Method Detail
-
getInputTypeRestriction
public TypeInformation[] getInputTypeRestriction()
Description copied from interface:AlgorithmGet the input type restriction used for negotiating the data query.- Specified by:
getInputTypeRestrictionin interfaceAlgorithm- Returns:
- Type restriction
-
run
public Clustering<KMeansModel> run(Relation<NumberVector> relation)
Run the clustering algorithm.- Parameters:
relation- Input data- Returns:
- Clustering
-
kmeans
private double[][] kmeans(double[][] cfmeans, ClusteringFeature[] cfs, int[] assignment, int[] weights)Perform k-means clustering.- Parameters:
cfmeans- Cluster feature meanscfs- Cluster featuresassignment- Cluster assignment of each CFweights- Cluster weight output- Returns:
- Cluster means
-
means
private double[][] means(int[] assignment, double[][] means, ClusteringFeature[] cfs, int[] weights)Calculate means of clusters.- Parameters:
assignment- Cluster assignmentmeans- Means of clusterscfs- Clustering features- Returns:
- Means of clusters.
-
assignToNearestCluster
private int assignToNearestCluster(int[] assignment, double[][] means, double[][] cfmeans, ClusteringFeature[] cfs, int[] weights)Assign each element to nearest cluster.- Parameters:
assignment- Current cluster assignmentmeans- k-means cluster meanscfmeans- Cluster feature meanscfs- Cluster featuresweights- Cluster weights (output)- Returns:
- Number of reassigned elements
-
distance
protected double distance(NumberVector x, double[] y)
Compute a distance (and count the distance computations).- Parameters:
x- First objecty- Second object- Returns:
- Distance
-
distance
protected double distance(double[] x, double[] y)Compute a distance (and count the distance computations).- Parameters:
x- First objecty- Second object- Returns:
- Distance
-
calculateVariances
private double[] calculateVariances(int[] assignment, double[][] means, ClusteringFeature[] cfs, int[] weights)Calculate variance of clusters based on clustering features.The result is only correct after updating the means!
- Parameters:
assignment- Cluster assignment of CFsmeans- Cluster meanscfs- CF leavesweights- Cluster weights- Returns:
- Per-cluster variances
-
-