Class BayesianInformationCriterion

  • All Implemented Interfaces:
    KMeansQualityMeasure<NumberVector>

    @Reference(authors="G. Schwarz",
               title="Estimating the dimension of a model",
               booktitle="The annals of statistics 6.2",
               url="https://doi.org/10.1214/aos/1176344136",
               bibkey="doi:10.1214/aos/1176344136")
    public class BayesianInformationCriterion
    extends AbstractKMeansQualityMeasure<NumberVector>
    Bayesian Information Criterion (BIC), also known as Schwarz criterion (SBC, SBIC) for the use with evaluating k-means results.

    Reference:

    G. Schwarz
    Estimating the dimension of a model
    The annals of statistics 6.2.

    The use for k-means was popularized by:

    D. Pelleg, A. Moore:
    X-means: Extending K-means with Efficient Estimation on the Number of Clusters
    Proc. 17th Int. Conf. on Machine Learning (ICML 2000)

    Since:
    0.7.0
    Author:
    Tibor Goldschwendt, Erich Schubert
    • Constructor Detail

      • BayesianInformationCriterion

        public BayesianInformationCriterion()
    • Method Detail

      • quality

        public <V extends NumberVector> double quality​(Clustering<? extends MeanModel> clustering,
                                                       NumberVectorDistance<? super V> distance,
                                                       Relation<V> relation)
        Description copied from interface: KMeansQualityMeasure
        Calculates and returns the quality measure.
        Type Parameters:
        V - Actual vector type (could be a subtype of O!)
        Parameters:
        clustering - Clustering to analyze
        distance - Distance function to use (usually Euclidean or squared Euclidean!)
        relation - Relation for accessing objects
        Returns:
        quality measure
      • isBetter

        public boolean isBetter​(double currentCost,
                                double bestCost)
        Description copied from interface: KMeansQualityMeasure
        Compare two scores.
        Parameters:
        currentCost - New (candiate) cost/score
        bestCost - Existing best cost/score (may be NaN)
        Returns:
        true when the new score is better, or the old score is NaN.