Class AbstractKMeansQualityMeasure<O extends NumberVector>

    • Constructor Detail

      • AbstractKMeansQualityMeasure

        public AbstractKMeansQualityMeasure()
    • Method Detail

      • numPoints

        public static int numPoints​(Clustering<? extends MeanModel> clustering)
        Compute the number of points in a given set of clusters (which may be less than the complete data set for X-means!)
        Parameters:
        clustering - Clustering to analyze
        Returns:
        Number of points
      • varianceContributionOfCluster

        public static double varianceContributionOfCluster​(Cluster<? extends MeanModel> cluster,
                                                           NumberVectorDistance<?> distance,
                                                           Relation<? extends NumberVector> relation)
        Variance contribution of a single cluster.

        If possible, this information is reused from the clustering process (when a KMeansModel is returned).

        Parameters:
        cluster - Cluster to access
        distance - Distance function
        relation - Data relation
        Returns:
        Cluster variance
      • logLikelihood

        @Reference(authors="A. Foglia, B. Hancock",
                   title="Notes on Bayesian Information Criterion Calculation for X-Means Clustering",
                   booktitle="Online",
                   url="https://github.com/bobhancock/goxmeans/blob/master/doc/BIC_notes.pdf",
                   bibkey="web/FogliaH12")
        public static double logLikelihood​(Relation<? extends NumberVector> relation,
                                           Clustering<? extends MeanModel> clustering,
                                           NumberVectorDistance<?> distance)
        Computes log likelihood of an entire clustering.

        A version that is supposed to correct some mistakes in the X-means publication, but experimentally they do not make much of a difference.

        Parameters:
        relation - Data relation
        clustering - Clustering
        distance - Distance function
        Returns:
        Log Likelihood.
      • numberOfFreeParameters

        public static int numberOfFreeParameters​(Relation<? extends NumberVector> relation,
                                                 Clustering<? extends MeanModel> clustering)
        Compute the number of free parameters.
        Parameters:
        relation - Data relation (for dimensionality)
        clustering - Set of clusters
        Returns:
        Number of free parameters