Class BetulaGMMWeighted

  • All Implemented Interfaces:
    Algorithm, ClusteringAlgorithm<Clustering<EMModel>>

    @Priority(-100)
    @Reference(authors="Andreas Lang and Erich Schubert",
               title="BETULA: Fast Clustering of Large Data with Improved BIRCH CF-Trees",
               booktitle="Information Systems",
               url="https://doi.org/10.1016/j.is.2021.101918",
               bibkey="DBLP:journals/is/LangS22")
    public class BetulaGMMWeighted
    extends BetulaGMM
    Clustering by expectation maximization (EM-Algorithm), also known as Gaussian Mixture Modeling (GMM), with optional MAP regularization. This version performs a more complex weighting based on the overlap of Gaussians – this is more expensive, and experimentally did not produce much better results than BetulaGMM.

    Reference:

    Andreas Lang and Erich Schubert
    BETULA: Fast Clustering of Large Data with Improved BIRCH CF-Trees
    Information Systems

    Since:
    0.8.0
    Author:
    Andreas Lang
    • Constructor Detail

      • BetulaGMMWeighted

        public BetulaGMMWeighted​(CFTree.Factory<?> cffactory,
                                 double delta,
                                 int k,
                                 int maxiter,
                                 boolean soft,
                                 BetulaClusterModelFactory<?> initialization,
                                 double prior)
        Constructor.
        Parameters:
        cffactory - CFTree factory
        k - Number of clusters
        maxiter - Maximum number of iterations
        initialization - Initialization method
        prior - MAP prior
    • Method Detail

      • assignProbabilitiesToInstances

        public double assignProbabilitiesToInstances​(java.util.ArrayList<? extends ClusterFeature> cfs,
                                                     java.util.List<? extends BetulaClusterModel> models,
                                                     java.util.Map<ClusterFeature,​double[]> probClusterIGivenX)
        Description copied from class: BetulaGMM
        Assigns the current probability values to the instances in the database and compute the expectation value of the current mixture of distributions.

        Computed as the sum of the logarithms of the prior probability of each instance.

        Overrides:
        assignProbabilitiesToInstances in class BetulaGMM
        Parameters:
        cfs - the cluster features to evaluate
        models - Cluster models
        probClusterIGivenX - Output storage for cluster probabilities
        Returns:
        the expectation value of the current mixture of distributions