Class CFKPlusPlusLeaves

  • Direct Known Subclasses:
    CFKPlusPlusTrunk

    @Alias("leaves")
    @Reference(authors="Andreas Lang and Erich Schubert",
               title="BETULA: Fast Clustering of Large Data with Improved BIRCH CF-Trees",
               booktitle="Information Systems",
               url="https://doi.org/10.1016/j.is.2021.101918",
               bibkey="DBLP:journals/is/LangS22")
    public class CFKPlusPlusLeaves
    extends AbstractCFKMeansInitialization
    K-Means++-like initialization for BETULA k-means, treating the leaf clustering features as a flat list, and called "leaves" in the publication. To initialize regular k-means, use KMeansPlusPlus instead.

    References:

    Andreas Lang and Erich Schubert
    BETULA: Fast Clustering of Large Data with Improved BIRCH CF-Trees
    Information Systems

    Since:
    0.8.0
    Author:
    Andreas Lang
    • Field Detail

      • distance

        protected CFInitWeight distance
        Distance function
      • firstUniform

        protected boolean firstUniform
        Choose the first center uniformly from the leaves.
    • Constructor Detail

      • CFKPlusPlusLeaves

        public CFKPlusPlusLeaves​(CFInitWeight dist,
                                 boolean firstUniform,
                                 RandomFactory rf)
        Constructor.
        Parameters:
        dist - distance function
        firstUniform - choose the first center uniformly from leaves
        rf - random generator
    • Method Detail

      • chooseInitialMeans

        public double[][] chooseInitialMeans​(CFTree<?> tree,
                                             java.util.List<? extends ClusterFeature> cfs,
                                             int k)
        Description copied from class: AbstractCFKMeansInitialization
        Build the initial models.
        Specified by:
        chooseInitialMeans in class AbstractCFKMeansInitialization
        Parameters:
        tree - CF tree
        cfs - Cluster features of the tree (may be ignored for tree-based initializations, should be an array list for efficiency)
        k - Number of clusters.
        Returns:
        initial cluster means
      • run

        public double[][] run​(CFTree<?> tree,
                              java.util.List<? extends ClusterFeature> cfs,
                              int k)
        Perform k-means++ initialization.
        Parameters:
        tree - CFTree
        cfs - Cluster features
        k - K
        Returns:
        Initial cluster centers
      • sampleFirst

        private ClusterFeature sampleFirst​(ClusterFeature root,
                                           java.util.List<? extends AsClusterFeature> cfs,
                                           java.util.Random rnd)
        Sample the first cluster center.
        Parameters:
        root - Root node of the tree
        cfs - Cluster features to sample from
        rnd - Random generator
        Returns:
        Selected cluster feature
      • initialWeights

        private double initialWeights​(ClusterFeature first,
                                      java.util.List<? extends AsClusterFeature> cfs,
                                      double[] weights)
        Initialize the weight list.
        Parameters:
        first - Id of first mean.
        cfs - Cluster features
        weights - Weights output
        Returns:
        Sum of weights
      • updateWeights

        private double updateWeights​(ClusterFeature latest,
                                     java.util.List<? extends AsClusterFeature> cfs,
                                     double[] weights)
        Update the weight list.
        Parameters:
        latest - Latest center
        cfs - Cluster features
        weights - Weights
        Returns:
        Weight sum