Class AbstractKMeansQualityMeasure<O extends NumberVector>
- java.lang.Object
-
- elki.clustering.kmeans.quality.AbstractKMeansQualityMeasure<O>
-
- All Implemented Interfaces:
KMeansQualityMeasure<O>
- Direct Known Subclasses:
AkaikeInformationCriterion
,AkaikeInformationCriterionXMeans
,BayesianInformationCriterion
,BayesianInformationCriterionXMeans
,BayesianInformationCriterionZhao
,WithinClusterVariance
public abstract class AbstractKMeansQualityMeasure<O extends NumberVector> extends java.lang.Object implements KMeansQualityMeasure<O>
Base class for evaluating clusterings by information criteria (such as AIC or BIC). Provides helper functions (e.g., max likelihood calculation) to its subclasses.References:
The use of information-theoretic criteria for evaluating k-means was popularized by X-means (see
BayesianInformationCriterionXMeans
):D. Pelleg, A. Moore
X-means: Extending K-means with Efficient Estimation on the Number of Clusters
Proc. 17th Int. Conf. on Machine Learning (ICML 2000)A different version of logLikelihood is derived in (see
BayesianInformationCriterionZhao
):Q. Zhao, M. Xu, P. Fränti
Knee Point Detection on Bayesian Information Criterion
20th IEEE International Conference on Tools with Artificial IntelligenceA longer derivation (but with a sign mistake) can be found in:
A. Foglia, B. Hancock
Notes on Bayesian Information Criterion Calculation for X-Means Clustering
https://github.com/bobhancock/goxmeans/blob/master/doc/BIC_notes.pdf- Since:
- 0.7.0
- Author:
- Tibor Goldschwendt, Erich Schubert
-
-
Constructor Summary
Constructors Constructor Description AbstractKMeansQualityMeasure()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static double
logLikelihood(Relation<? extends NumberVector> relation, Clustering<? extends MeanModel> clustering, NumberVectorDistance<?> distance)
Computes log likelihood of an entire clustering.static int
numberOfFreeParameters(Relation<? extends NumberVector> relation, Clustering<? extends MeanModel> clustering)
Compute the number of free parameters.static int
numPoints(Clustering<? extends MeanModel> clustering)
Compute the number of points in a given set of clusters (which may be less than the complete data set for X-means!)static double
varianceContributionOfCluster(Cluster<? extends MeanModel> cluster, NumberVectorDistance<?> distance, Relation<? extends NumberVector> relation)
Variance contribution of a single cluster.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface elki.clustering.kmeans.quality.KMeansQualityMeasure
isBetter, quality
-
-
-
-
Method Detail
-
numPoints
public static int numPoints(Clustering<? extends MeanModel> clustering)
Compute the number of points in a given set of clusters (which may be less than the complete data set for X-means!)- Parameters:
clustering
- Clustering to analyze- Returns:
- Number of points
-
varianceContributionOfCluster
public static double varianceContributionOfCluster(Cluster<? extends MeanModel> cluster, NumberVectorDistance<?> distance, Relation<? extends NumberVector> relation)
Variance contribution of a single cluster.If possible, this information is reused from the clustering process (when a KMeansModel is returned).
- Parameters:
cluster
- Cluster to accessdistance
- Distance functionrelation
- Data relation- Returns:
- Cluster variance
-
logLikelihood
@Reference(authors="A. Foglia, B. Hancock", title="Notes on Bayesian Information Criterion Calculation for X-Means Clustering", booktitle="Online", url="https://github.com/bobhancock/goxmeans/blob/master/doc/BIC_notes.pdf", bibkey="web/FogliaH12") public static double logLikelihood(Relation<? extends NumberVector> relation, Clustering<? extends MeanModel> clustering, NumberVectorDistance<?> distance)
Computes log likelihood of an entire clustering.A version that is supposed to correct some mistakes in the X-means publication, but experimentally they do not make much of a difference.
- Parameters:
relation
- Data relationclustering
- Clusteringdistance
- Distance function- Returns:
- Log Likelihood.
-
numberOfFreeParameters
public static int numberOfFreeParameters(Relation<? extends NumberVector> relation, Clustering<? extends MeanModel> clustering)
Compute the number of free parameters.- Parameters:
relation
- Data relation (for dimensionality)clustering
- Set of clusters- Returns:
- Number of free parameters
-
-