Class CFTree


  • @Reference(authors="T. Zhang, R. Ramakrishnan, M. Livny",title="BIRCH: An Efficient Data Clustering Method for Very Large Databases",booktitle="Proc. 1996 ACM SIGMOD International Conference on Management of Data",url="https://doi.org/10.1145/233269.233324",bibkey="DBLP:conf/sigmod/ZhangRL96") @Reference(authors="T. Zhang, R. Ramakrishnan, M. Livny",title="BIRCH: A New Data Clustering Algorithm and Its Applications",booktitle="Data Min. Knowl. Discovery",url="https://doi.org/10.1023/A:1009783824328",bibkey="DBLP:journals/datamine/ZhangRL97")
    public class CFTree
    extends java.lang.Object
    Partial implementation of the CFTree as used by BIRCH.

    Important differences:

    1. Leaf nodes and directory nodes have the same capacity
    2. Condensing and memory limits are not implemented
    3. Merging refinement (merge-resplit) is not implemented
    Because we want to be able to track the cluster assignments of all data points easily, we need to store the point IDs, and it is not possible to implement the originally proposed page size management at the same time.

    Condensing and merging refinement are possible, and improvements to this code are welcome - please send a pull request!

    References:

    T. Zhang, R. Ramakrishnan, M. Livny
    BIRCH: An Efficient Data Clustering Method for Very Large Databases Proc. 1996 ACM SIGMOD International Conference on Management of Data

    T. Zhang, R. Ramakrishnan, M. Livny
    BIRCH: A New Data Clustering Algorithm and Its Applications
    Data Min. Knowl. Discovery

    Since:
    0.7.5
    Author:
    Erich Schubert
    • Field Detail

      • LOG

        public static final Logging LOG
        Class logger.
      • thresholdsq

        double thresholdsq
        Squared maximum radius threshold of a clustering feature.
      • capacity

        int capacity
        Capacity of a node.
      • leaves

        int leaves
        Leaf node counter.
    • Constructor Detail

      • CFTree

        public CFTree​(BIRCHDistance distance,
                      BIRCHAbsorptionCriterion absorption,
                      double threshold,
                      int capacity)
        Constructor.
        Parameters:
        distance - Distance function to use
        absorption - Absorption criterion
        threshold - Threshold
        capacity - Capacity
    • Method Detail

      • insert

        public void insert​(NumberVector nv)
        Insert a data point into the tree.
        Parameters:
        nv - Object data
      • rebuildTree

        protected void rebuildTree()
        Rebuild the CFTree to condense it to approximately half the size.
      • estimateThreshold

        private double estimateThreshold​(CFTree.TreeNode current)
      • add

        private boolean add​(ClusteringFeature[] children,
                            ClusteringFeature child)
        Add a node to the first unused slot.
        Parameters:
        children - Children list
        child - Child to add
        Returns:
        false if node is full
      • leafIterator

        public CFTree.LeafIterator leafIterator()
        Get an iterator over the leaf nodes.
        Returns:
        Leaf node iterator.
      • printDebug

        protected java.lang.StringBuilder printDebug​(java.lang.StringBuilder buf,
                                                     ClusteringFeature n,
                                                     int d)
        Utility function for debugging.
        Parameters:
        buf - Output buffer
        n - Current node
        d - Depth
        Returns:
        Output buffer