Class BayesianInformationCriterionXMeans

  • All Implemented Interfaces:
    KMeansQualityMeasure<NumberVector>

    @Title("Bayesian Information Criterion (X-means Version)")
    @Reference(authors="D. Pelleg, A. Moore",
               title="X-means: Extending K-means with Efficient Estimation on the Number of Clusters",
               booktitle="Proc. 17th Int. Conf. on Machine Learning (ICML 2000)",
               url="http://www.pelleg.org/shared/hp/download/xmeans.ps",
               bibkey="DBLP:conf/icml/PellegM00")
    public class BayesianInformationCriterionXMeans
    extends AbstractKMeansQualityMeasure<NumberVector>
    Bayesian Information Criterion (BIC), also known as Schwarz criterion (SBC, SBIC) for the use with evaluating k-means results.

    This version tries to be close to the version used in X-means, although people have argued that there are errors in this formulation.

    Reference:

    D. Pelleg, A. Moore:
    X-means: Extending K-means with Efficient Estimation on the Number of Clusters
    Proc. 17th Int. Conf. on Machine Learning (ICML 2000)

    Since:
    0.7.0
    Author:
    Tibor Goldschwendt, Erich Schubert
    • Constructor Detail

      • BayesianInformationCriterionXMeans

        public BayesianInformationCriterionXMeans()
    • Method Detail

      • quality

        public <V extends NumberVector> double quality​(Clustering<? extends MeanModel> clustering,
                                                       NumberVectorDistance<? super V> distance,
                                                       Relation<V> relation)
        Description copied from interface: KMeansQualityMeasure
        Calculates and returns the quality measure.
        Type Parameters:
        V - Actual vector type (could be a subtype of O!)
        Parameters:
        clustering - Clustering to analyze
        distance - Distance function to use (usually Euclidean or squared Euclidean!)
        relation - Relation for accessing objects
        Returns:
        quality measure
      • logLikelihoodXMeans

        public static double logLikelihoodXMeans​(Relation<? extends NumberVector> relation,
                                                 Clustering<? extends MeanModel> clustering,
                                                 NumberVectorDistance<?> distance)
        Computes log likelihood of an entire clustering.

        Version as used in the X-means publication.

        Parameters:
        relation - Data relation
        clustering - Clustering
        distance - Distance function
        Returns:
        Log Likelihood.
      • isBetter

        public boolean isBetter​(double currentCost,
                                double bestCost)
        Description copied from interface: KMeansQualityMeasure
        Compare two scores.
        Parameters:
        currentCost - New (candiate) cost/score
        bestCost - Existing best cost/score (may be NaN)
        Returns:
        true when the new score is better, or the old score is NaN.