Class KDTreeEM

  • All Implemented Interfaces:
    Algorithm, ClusteringAlgorithm<Clustering<EMModel>>

    @Description("Gaussian mixture modeling accelerated using a kd-tree")
    @Reference(authors="Andrew W. Moore",
               booktitle="Advances in Neural Information Processing Systems 11 (NIPS 1998)",
               title="Very Fast EM-based Mixture Model Clustering using Multiresolution kd-trees",
    public class KDTreeEM
    extends java.lang.Object
    implements ClusteringAlgorithm<Clustering<EMModel>>
    Clustering by expectation maximization (EM-Algorithm), also known as Gaussian Mixture Modeling (GMM), calculated on a kd-tree. If supported, tries to prune during calculation.


    A. W. Moore:
    Very Fast EM-based Mixture Model Clustering using Multiresolution kd-trees.
    Neural Information Processing Systems (NIPS 1998)

    Robert Gehde
    • Field Detail

      • LOG

        private static final Logging LOG
        Logging object
      • soft

        private boolean soft
        Retain soft assignments.
      • delta

        private double delta
        Delta parameter
      • k

        private int k
        number of models
      • mbw

        private double mbw
        minimum leaf size
      • tau

        private double tau
        tau, low for precise, high for fast results.
      • tauClass

        private double tauClass
        Drop one class if the maximum weight of a class in the bounding box is lower than tauClass * wmin_max, where wmin_max is the maximum minimum weight of all classes
      • miniter

        private int miniter
        minimum amount of iterations
      • maxiter

        private int maxiter
        maximum amount of iterations
      • ipiPow

        private double ipiPow
        Gaussian scaling factor for likelihood.
      • wsum

        private double[] wsum
        Cluster weights
      • exactAssign

        protected boolean exactAssign
        Perform exact cluster assignments
    • Constructor Detail

      • KDTreeEM

        public KDTreeEM​(int k,
                        double mbw,
                        double tau,
                        double tauclass,
                        double delta,
                        TextbookMultivariateGaussianModelFactory mfactory,
                        int miniter,
                        int maxiter,
                        boolean soft,
                        boolean exactAssign)
        k - number of classes
        mbw - minimum relative size of leaf nodes
        tau - pruning parameter
        tauclass - pruning parameter for single classes
        delta - delta parameter
        mfactory - EM cluster model factory
        miniter - Minimum number of iterations
        maxiter - Maximum number of iterations
        soft - Include soft assignments
        exactAssign - Perform exact assignments at the end
    • Method Detail

      • run

        public Clustering<EMModel> run​(Relation<? extends NumberVector> relation)
        Calculates the EM Clustering with the given values by calling makeStats and calculation the new models from the given results
        relation - Data Relation
        Clustering KDTreeEM Clustering
      • analyseDimWidth

        private double[] analyseDimWidth​(Relation<? extends NumberVector> relation)
        Helper method to retrieve the widths of all data in all dimensions.
        relation - Relation to analyze
        width of each dimension
      • checkStoppingCondition

        private int[] checkStoppingCondition​(KDTreeEM.KDTree node,
                                             int[] indices)
        This methods checks the different stopping conditions given in the paper, thus calculating the Dimensions, that will be considered for child-trees. If this method returns a non-empty subset of the input dimension set, it means that missing dimensions are dropped because their weight was too small. If it returns a null array it means that the expected error of all remaining models is small enough to consider this node a leaf node.
        node - kd tree node
        indices - list of indices to check
        indices that are not pruned, null if everything was pruned
      • calculateModelLimits

        private void calculateModelLimits​(KDTreeEM.KDTree node,
                                          TextbookMultivariateGaussianModel model,
                                          double[] minpnt,
                                          double[] maxpnt,
                                          double[] ret)
        Calculates the model limits inside this node by translating the Gaussian model into a squared function.
        model - model to calculate the limits for
        minpnt - result array for argmin
        maxpnt - result array for argmax
        ret - Return array
      • makeStats

        private double makeStats​(KDTreeEM.KDTree node,
                                 int[] indices,
                                 WritableDataStore<double[]> probs)
        Calculates the statistics on the kd-tree needed for the calculation of the new models
        node - next node
        indices - list of indices to use in calculation, initially all
        probs - cluster assignment
        log likelihood of the model
      • getInputTypeRestriction

        public TypeInformation[] getInputTypeRestriction()
        Description copied from interface: Algorithm
        Get the input type restriction used for negotiating the data query.
        Specified by:
        getInputTypeRestriction in interface Algorithm
        Type restriction