Package elki.clustering.kmeans
Class BetulaLloydKMeans
- java.lang.Object
-
- elki.clustering.kmeans.AbstractKMeans<NumberVector,KMeansModel>
-
- elki.clustering.kmeans.BetulaLloydKMeans
-
- All Implemented Interfaces:
Algorithm
,ClusteringAlgorithm<Clustering<KMeansModel>>
,KMeans<NumberVector,KMeansModel>
@Reference(authors="Andreas Lang and Erich Schubert", title="BETULA: Fast Clustering of Large Data with Improved BIRCH CF-Trees", booktitle="Information Systems", url="https://doi.org/10.1016/j.is.2021.101918", bibkey="DBLP:journals/is/LangS22") public class BetulaLloydKMeans extends AbstractKMeans<NumberVector,KMeansModel>
BIRCH/BETULA-based clustering algorithm that simply treats the leafs of the CFTree as clusters.References:
Andreas Lang and Erich Schubert
BETULA: Fast Clustering of Large Data with Improved BIRCH CF-Trees
Information Systems- Since:
- 0.8.0
- Author:
- Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
BetulaLloydKMeans.Par
Parameterization class.-
Nested classes/interfaces inherited from class elki.clustering.kmeans.AbstractKMeans
AbstractKMeans.Instance
-
Nested classes/interfaces inherited from interface elki.Algorithm
Algorithm.Utils
-
-
Field Summary
Fields Modifier and Type Field Description (package private) CFTree.Factory<?>
cffactory
CFTree factory.(package private) long
diststat
Number of distance caclulations(package private) boolean
ignoreWeight
Ignore weight(package private) AbstractCFKMeansInitialization
initialization
k-means++ initializationprivate static Logging
LOG
Class logger.(package private) boolean
storeIds
Store ids-
Fields inherited from class elki.clustering.kmeans.AbstractKMeans
distance, initializer, k, maxiter
-
Fields inherited from interface elki.clustering.kmeans.KMeans
DISTANCE_FUNCTION_ID, INIT_ID, K_ID, MAXITER_ID, SEED_ID, VARSTAT_ID
-
-
Constructor Summary
Constructors Constructor Description BetulaLloydKMeans(int k, int maxiter, CFTree.Factory<?> cffactory, AbstractCFKMeansInitialization initialization, boolean storeIds, boolean ignoreWeight)
Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private int
assignToNearestCluster(int[] assignment, double[][] means, java.util.ArrayList<? extends ClusterFeature> cfs, int[] weights)
Assign each element to nearest cluster.protected double[]
calculateVariances(int[] assignment, double[][] means, java.util.ArrayList<? extends ClusterFeature> cfs, int[] weights)
Calculate variance of clusters based on clustering features.private double
distance(double[] x, double[] y)
Updates statistics and calculates distance between two Objects based on selected criteria.private double
distance(NumberVector x, double[] y)
Updates statistics and calculates distance between two Objects based on selected criteria.protected Logging
getLogger()
Get the (STATIC) logger for this class.private double[][]
kmeans(java.util.ArrayList<? extends ClusterFeature> cfs, int[] assignment, int[] weights, CFTree<?> tree)
Perform k-means clustering.private double[][]
means(int[] assignment, double[][] means, java.util.ArrayList<? extends ClusterFeature> cfs, int[] weights)
Calculate means of clusters.Clustering<KMeansModel>
run(Relation<NumberVector> relation)
Run the clustering algorithm.-
Methods inherited from class elki.clustering.kmeans.AbstractKMeans
getDistance, getInputTypeRestriction, incrementalUpdateMean, initialMeans, means, minusEquals, nearestMeans, plusEquals, plusMinusEquals, setDistance, setInitializer, setK
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface elki.clustering.ClusteringAlgorithm
autorun
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
Class logger.
-
cffactory
CFTree.Factory<?> cffactory
CFTree factory.
-
initialization
AbstractCFKMeansInitialization initialization
k-means++ initialization
-
storeIds
boolean storeIds
Store ids
-
ignoreWeight
boolean ignoreWeight
Ignore weight
-
diststat
long diststat
Number of distance caclulations
-
-
Constructor Detail
-
BetulaLloydKMeans
public BetulaLloydKMeans(int k, int maxiter, CFTree.Factory<?> cffactory, AbstractCFKMeansInitialization initialization, boolean storeIds, boolean ignoreWeight)
Constructor.- Parameters:
k
- Number of clustersmaxiter
- Maximum number of iterationscffactory
- CFTree factoryinitialization
- Initialization method for k-meansstoreIds
- Store IDs to avoid reassignment costignoreWeight
- Ignore the leaf weights
-
-
Method Detail
-
run
public Clustering<KMeansModel> run(Relation<NumberVector> relation)
Run the clustering algorithm.- Parameters:
relation
- Input data- Returns:
- Clustering
-
kmeans
private double[][] kmeans(java.util.ArrayList<? extends ClusterFeature> cfs, int[] assignment, int[] weights, CFTree<?> tree)
Perform k-means clustering.- Parameters:
cfs
- Cluster featuresassignment
- Cluster assignment of each CFweights
- Cluster weight outputtree
- CF tree- Returns:
- Cluster means
-
means
private double[][] means(int[] assignment, double[][] means, java.util.ArrayList<? extends ClusterFeature> cfs, int[] weights)
Calculate means of clusters.- Parameters:
assignment
- Cluster assignmentmeans
- Means of clusterscfs
- Clustering featuresweights
- Cluster weights- Returns:
- Means of clusters.
-
assignToNearestCluster
private int assignToNearestCluster(int[] assignment, double[][] means, java.util.ArrayList<? extends ClusterFeature> cfs, int[] weights)
Assign each element to nearest cluster.- Parameters:
assignment
- Current cluster assignmentmeans
- k-means cluster meanscfs
- Cluster featuresweights
- Cluster weights (output)- Returns:
- Number of reassigned elements
-
calculateVariances
protected double[] calculateVariances(int[] assignment, double[][] means, java.util.ArrayList<? extends ClusterFeature> cfs, int[] weights)
Calculate variance of clusters based on clustering features.The result is only correct after updating the means!
- Parameters:
assignment
- Cluster assignment of CFsmeans
- Cluster meanscfs
- CF leavesweights
- Cluster weights- Returns:
- Per-cluster variances
-
distance
private double distance(NumberVector x, double[] y)
Updates statistics and calculates distance between two Objects based on selected criteria.Note: specializing this rather than calling SquaredEuclideanDistance was much faster, as we can avoid wrapping the array.
- Parameters:
x
- Point xy
- Point y- Returns:
- distance
-
distance
private double distance(double[] x, double[] y)
Updates statistics and calculates distance between two Objects based on selected criteria.- Parameters:
x
- Point xy
- Point y- Returns:
- distance
-
getLogger
protected Logging getLogger()
Description copied from class:AbstractKMeans
Get the (STATIC) logger for this class.- Specified by:
getLogger
in classAbstractKMeans<NumberVector,KMeansModel>
- Returns:
- the static logger
-
-