Class CFKPlusPlusTree


  • @Alias("tree")
    @Reference(authors="Andreas Lang and Erich Schubert",
               title="BETULA: Fast Clustering of Large Data with Improved BIRCH CF-Trees",
               booktitle="Information Systems",
               url="https://doi.org/10.1016/j.is.2021.101918",
               bibkey="DBLP:journals/is/LangS22")
    public class CFKPlusPlusTree
    extends AbstractCFKMeansInitialization
    Initialize K-means by following tree paths weighted by their variance contribution. This is the strategy denoted "tree" in the reference.

    References:

    Andreas Lang and Erich Schubert
    BETULA: Fast Clustering of Large Data with Improved BIRCH CF-Trees
    Information Systems

    Since:
    0.8.0
    Author:
    Andreas Lang
    • Field Detail

      • dist

        CFInitWeight dist
        Distance function to use for initial means
      • firstUniform

        boolean firstUniform
        Choose the first center uniformly from the cluster features.
      • maxdepth

        int maxdepth
        Maximum depth to choose at.
    • Constructor Detail

      • CFKPlusPlusTree

        public CFKPlusPlusTree​(CFInitWeight dist,
                               boolean firstUniform,
                               int maxdepth,
                               RandomFactory rf)
        Constructor.
        Parameters:
        dist - distance function
        firstUniform - choose first center uniformly from the leaves
        maxdepth - maximum depth
        rf - random generator
    • Method Detail

      • chooseInitialMeans

        public double[][] chooseInitialMeans​(CFTree<?> tree,
                                             java.util.List<? extends ClusterFeature> cfs,
                                             int k)
        Description copied from class: AbstractCFKMeansInitialization
        Build the initial models.
        Specified by:
        chooseInitialMeans in class AbstractCFKMeansInitialization
        Parameters:
        tree - CF tree
        cfs - Cluster features of the tree (may be ignored for tree-based initializations, should be an array list for efficiency)
        k - Number of clusters.
        Returns:
        initial cluster means
      • chooseNextNode

        private AsClusterFeature chooseNextNode​(CFNode<?> current,
                                                java.util.List<? extends ClusterFeature> ccs,
                                                java.util.Random rnd)
        Choose a child of the current node.
        Parameters:
        current - Current node
        ccs - Currently chosen cluster centers
        rnd - Random generator
        Returns:
        New cluster center