Class BetulaLloydKMeans

    • Field Detail

      • LOG

        private static final Logging LOG
        Class logger.
      • storeIds

        boolean storeIds
        Store ids
      • ignoreWeight

        boolean ignoreWeight
        Ignore weight
      • diststat

        long diststat
        Number of distance caclulations
    • Constructor Detail

      • BetulaLloydKMeans

        public BetulaLloydKMeans​(int k,
                                 int maxiter,
                                 CFTree.Factory<?> cffactory,
                                 AbstractCFKMeansInitialization initialization,
                                 boolean storeIds,
                                 boolean ignoreWeight)
        Constructor.
        Parameters:
        k - Number of clusters
        maxiter - Maximum number of iterations
        cffactory - CFTree factory
        initialization - Initialization method for k-means
        storeIds - Store IDs to avoid reassignment cost
        ignoreWeight - Ignore the leaf weights
    • Method Detail

      • kmeans

        private double[][] kmeans​(java.util.ArrayList<? extends ClusterFeature> cfs,
                                  int[] assignment,
                                  int[] weights,
                                  CFTree<?> tree)
        Perform k-means clustering.
        Parameters:
        cfs - Cluster features
        assignment - Cluster assignment of each CF
        weights - Cluster weight output
        tree - CF tree
        Returns:
        Cluster means
      • means

        private double[][] means​(int[] assignment,
                                 double[][] means,
                                 java.util.ArrayList<? extends ClusterFeature> cfs,
                                 int[] weights)
        Calculate means of clusters.
        Parameters:
        assignment - Cluster assignment
        means - Means of clusters
        cfs - Clustering features
        weights - Cluster weights
        Returns:
        Means of clusters.
      • assignToNearestCluster

        private int assignToNearestCluster​(int[] assignment,
                                           double[][] means,
                                           java.util.ArrayList<? extends ClusterFeature> cfs,
                                           int[] weights)
        Assign each element to nearest cluster.
        Parameters:
        assignment - Current cluster assignment
        means - k-means cluster means
        cfs - Cluster features
        weights - Cluster weights (output)
        Returns:
        Number of reassigned elements
      • calculateVariances

        protected double[] calculateVariances​(int[] assignment,
                                              double[][] means,
                                              java.util.ArrayList<? extends ClusterFeature> cfs,
                                              int[] weights)
        Calculate variance of clusters based on clustering features.

        The result is only correct after updating the means!

        Parameters:
        assignment - Cluster assignment of CFs
        means - Cluster means
        cfs - CF leaves
        weights - Cluster weights
        Returns:
        Per-cluster variances
      • distance

        private double distance​(NumberVector x,
                                double[] y)
        Updates statistics and calculates distance between two Objects based on selected criteria.

        Note: specializing this rather than calling SquaredEuclideanDistance was much faster, as we can avoid wrapping the array.

        Parameters:
        x - Point x
        y - Point y
        Returns:
        distance
      • distance

        private double distance​(double[] x,
                                double[] y)
        Updates statistics and calculates distance between two Objects based on selected criteria.
        Parameters:
        x - Point x
        y - Point y
        Returns:
        distance