Package elki.clustering.kmeans
Class BetulaLloydKMeans
- java.lang.Object
-
- elki.clustering.kmeans.AbstractKMeans<NumberVector,KMeansModel>
-
- elki.clustering.kmeans.BetulaLloydKMeans
-
- All Implemented Interfaces:
Algorithm,ClusteringAlgorithm<Clustering<KMeansModel>>,KMeans<NumberVector,KMeansModel>
@Reference(authors="Andreas Lang and Erich Schubert", title="BETULA: Fast Clustering of Large Data with Improved BIRCH CF-Trees", booktitle="Information Systems", url="https://doi.org/10.1016/j.is.2021.101918", bibkey="DBLP:journals/is/LangS22") public class BetulaLloydKMeans extends AbstractKMeans<NumberVector,KMeansModel>
BIRCH/BETULA-based clustering algorithm that simply treats the leafs of the CFTree as clusters.References:
Andreas Lang and Erich Schubert
BETULA: Fast Clustering of Large Data with Improved BIRCH CF-Trees
Information Systems- Since:
- 0.8.0
- Author:
- Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classBetulaLloydKMeans.ParParameterization class.-
Nested classes/interfaces inherited from class elki.clustering.kmeans.AbstractKMeans
AbstractKMeans.Instance
-
Nested classes/interfaces inherited from interface elki.Algorithm
Algorithm.Utils
-
-
Field Summary
Fields Modifier and Type Field Description (package private) CFTree.Factory<?>cffactoryCFTree factory.(package private) longdiststatNumber of distance caclulations(package private) booleanignoreWeightIgnore weight(package private) AbstractCFKMeansInitializationinitializationk-means++ initializationprivate static LoggingLOGClass logger.(package private) booleanstoreIdsStore ids-
Fields inherited from class elki.clustering.kmeans.AbstractKMeans
distance, initializer, k, maxiter
-
Fields inherited from interface elki.clustering.kmeans.KMeans
DISTANCE_FUNCTION_ID, INIT_ID, K_ID, MAXITER_ID, SEED_ID, VARSTAT_ID
-
-
Constructor Summary
Constructors Constructor Description BetulaLloydKMeans(int k, int maxiter, CFTree.Factory<?> cffactory, AbstractCFKMeansInitialization initialization, boolean storeIds, boolean ignoreWeight)Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private intassignToNearestCluster(int[] assignment, double[][] means, java.util.ArrayList<? extends ClusterFeature> cfs, int[] weights)Assign each element to nearest cluster.protected double[]calculateVariances(int[] assignment, double[][] means, java.util.ArrayList<? extends ClusterFeature> cfs, int[] weights)Calculate variance of clusters based on clustering features.private doubledistance(double[] x, double[] y)Updates statistics and calculates distance between two Objects based on selected criteria.private doubledistance(NumberVector x, double[] y)Updates statistics and calculates distance between two Objects based on selected criteria.protected LogginggetLogger()Get the (STATIC) logger for this class.private double[][]kmeans(java.util.ArrayList<? extends ClusterFeature> cfs, int[] assignment, int[] weights, CFTree<?> tree)Perform k-means clustering.private double[][]means(int[] assignment, double[][] means, java.util.ArrayList<? extends ClusterFeature> cfs, int[] weights)Calculate means of clusters.Clustering<KMeansModel>run(Relation<NumberVector> relation)Run the clustering algorithm.-
Methods inherited from class elki.clustering.kmeans.AbstractKMeans
getDistance, getInputTypeRestriction, incrementalUpdateMean, initialMeans, means, minusEquals, nearestMeans, plusEquals, plusMinusEquals, setDistance, setInitializer, setK
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface elki.clustering.ClusteringAlgorithm
autorun
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
Class logger.
-
cffactory
CFTree.Factory<?> cffactory
CFTree factory.
-
initialization
AbstractCFKMeansInitialization initialization
k-means++ initialization
-
storeIds
boolean storeIds
Store ids
-
ignoreWeight
boolean ignoreWeight
Ignore weight
-
diststat
long diststat
Number of distance caclulations
-
-
Constructor Detail
-
BetulaLloydKMeans
public BetulaLloydKMeans(int k, int maxiter, CFTree.Factory<?> cffactory, AbstractCFKMeansInitialization initialization, boolean storeIds, boolean ignoreWeight)Constructor.- Parameters:
k- Number of clustersmaxiter- Maximum number of iterationscffactory- CFTree factoryinitialization- Initialization method for k-meansstoreIds- Store IDs to avoid reassignment costignoreWeight- Ignore the leaf weights
-
-
Method Detail
-
run
public Clustering<KMeansModel> run(Relation<NumberVector> relation)
Run the clustering algorithm.- Parameters:
relation- Input data- Returns:
- Clustering
-
kmeans
private double[][] kmeans(java.util.ArrayList<? extends ClusterFeature> cfs, int[] assignment, int[] weights, CFTree<?> tree)
Perform k-means clustering.- Parameters:
cfs- Cluster featuresassignment- Cluster assignment of each CFweights- Cluster weight outputtree- CF tree- Returns:
- Cluster means
-
means
private double[][] means(int[] assignment, double[][] means, java.util.ArrayList<? extends ClusterFeature> cfs, int[] weights)Calculate means of clusters.- Parameters:
assignment- Cluster assignmentmeans- Means of clusterscfs- Clustering featuresweights- Cluster weights- Returns:
- Means of clusters.
-
assignToNearestCluster
private int assignToNearestCluster(int[] assignment, double[][] means, java.util.ArrayList<? extends ClusterFeature> cfs, int[] weights)Assign each element to nearest cluster.- Parameters:
assignment- Current cluster assignmentmeans- k-means cluster meanscfs- Cluster featuresweights- Cluster weights (output)- Returns:
- Number of reassigned elements
-
calculateVariances
protected double[] calculateVariances(int[] assignment, double[][] means, java.util.ArrayList<? extends ClusterFeature> cfs, int[] weights)Calculate variance of clusters based on clustering features.The result is only correct after updating the means!
- Parameters:
assignment- Cluster assignment of CFsmeans- Cluster meanscfs- CF leavesweights- Cluster weights- Returns:
- Per-cluster variances
-
distance
private double distance(NumberVector x, double[] y)
Updates statistics and calculates distance between two Objects based on selected criteria.Note: specializing this rather than calling SquaredEuclideanDistance was much faster, as we can avoid wrapping the array.
- Parameters:
x- Point xy- Point y- Returns:
- distance
-
distance
private double distance(double[] x, double[] y)Updates statistics and calculates distance between two Objects based on selected criteria.- Parameters:
x- Point xy- Point y- Returns:
- distance
-
getLogger
protected Logging getLogger()
Description copied from class:AbstractKMeansGet the (STATIC) logger for this class.- Specified by:
getLoggerin classAbstractKMeans<NumberVector,KMeansModel>- Returns:
- the static logger
-
-