Class CFTree
- java.lang.Object
-
- elki.clustering.hierarchical.birch.CFTree
-
@Reference(authors="T. Zhang, R. Ramakrishnan, M. Livny",title="BIRCH: An Efficient Data Clustering Method for Very Large Databases",booktitle="Proc. 1996 ACM SIGMOD International Conference on Management of Data",url="https://doi.org/10.1145/233269.233324",bibkey="DBLP:conf/sigmod/ZhangRL96") @Reference(authors="T. Zhang, R. Ramakrishnan, M. Livny",title="BIRCH: A New Data Clustering Algorithm and Its Applications",booktitle="Data Min. Knowl. Discovery",url="https://doi.org/10.1023/A:1009783824328",bibkey="DBLP:journals/datamine/ZhangRL97") public class CFTree extends java.lang.Object
Partial implementation of the CFTree as used by BIRCH.Important differences:
- Leaf nodes and directory nodes have the same capacity
- Condensing and memory limits are not implemented
- Merging refinement (merge-resplit) is not implemented
Condensing and merging refinement are possible, and improvements to this code are welcome - please send a pull request!
References:
T. Zhang, R. Ramakrishnan, M. Livny
BIRCH: An Efficient Data Clustering Method for Very Large Databases Proc. 1996 ACM SIGMOD International Conference on Management of DataT. Zhang, R. Ramakrishnan, M. Livny
BIRCH: A New Data Clustering Algorithm and Its Applications
Data Min. Knowl. Discovery- Since:
- 0.7.5
- Author:
- Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classCFTree.FactoryCF-Tree Factory.static classCFTree.LeafIteratorIterator over leaf nodes.
-
Field Summary
Fields Modifier and Type Field Description (package private) BIRCHAbsorptionCriterionabsorptionCriterion for absorbing points.(package private) intcapacityCapacity of a node.(package private) BIRCHDistancedistanceDistance function to use.(package private) intleavesLeaf node counter.static LoggingLOGClass logger.(package private) CFTree.TreeNoderootCurrent root node.(package private) doublethresholdsqSquared maximum radius threshold of a clustering feature.
-
Constructor Summary
Constructors Constructor Description CFTree(BIRCHDistance distance, BIRCHAbsorptionCriterion absorption, double threshold, int capacity)Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private booleanadd(ClusteringFeature[] children, ClusteringFeature child)Add a node to the first unused slot.private doubleestimateThreshold(CFTree.TreeNode current)private ClusteringFeaturefindLeaf(CFTree.TreeNode node, NumberVector nv)Find the leaf of a cluster, to get the final cluster assignment.ClusteringFeaturefindLeaf(NumberVector nv)Find the leaf of a cluster, to get the final cluster assignment.private CFTree.TreeNodeinsert(CFTree.TreeNode node, ClusteringFeature nleaf)Recursive insertion.private CFTree.TreeNodeinsert(CFTree.TreeNode node, NumberVector nv)Recursive insertion.voidinsert(NumberVector nv)Insert a data point into the tree.CFTree.LeafIteratorleafIterator()Get an iterator over the leaf nodes.protected java.lang.StringBuilderprintDebug(java.lang.StringBuilder buf, ClusteringFeature n, int d)Utility function for debugging.protected voidrebuildTree()Rebuild the CFTree to condense it to approximately half the size.private CFTree.TreeNodesplit(CFTree.TreeNode node, ClusteringFeature newchild)Split an overfull node.
-
-
-
Field Detail
-
LOG
public static final Logging LOG
Class logger.
-
distance
BIRCHDistance distance
Distance function to use.
-
absorption
BIRCHAbsorptionCriterion absorption
Criterion for absorbing points.
-
thresholdsq
double thresholdsq
Squared maximum radius threshold of a clustering feature.
-
capacity
int capacity
Capacity of a node.
-
root
CFTree.TreeNode root
Current root node.
-
leaves
int leaves
Leaf node counter.
-
-
Constructor Detail
-
CFTree
public CFTree(BIRCHDistance distance, BIRCHAbsorptionCriterion absorption, double threshold, int capacity)
Constructor.- Parameters:
distance- Distance function to useabsorption- Absorption criterionthreshold- Thresholdcapacity- Capacity
-
-
Method Detail
-
insert
public void insert(NumberVector nv)
Insert a data point into the tree.- Parameters:
nv- Object data
-
rebuildTree
protected void rebuildTree()
Rebuild the CFTree to condense it to approximately half the size.
-
estimateThreshold
private double estimateThreshold(CFTree.TreeNode current)
-
insert
private CFTree.TreeNode insert(CFTree.TreeNode node, NumberVector nv)
Recursive insertion.- Parameters:
node- Current nodenv- Object data- Returns:
- New sibling, if the node was split.
-
findLeaf
public ClusteringFeature findLeaf(NumberVector nv)
Find the leaf of a cluster, to get the final cluster assignment.In contrast to
insert(elki.data.NumberVector), this does not modify the tree.- Parameters:
nv- Object data- Returns:
- Leaf this vector should be assigned to
-
findLeaf
private ClusteringFeature findLeaf(CFTree.TreeNode node, NumberVector nv)
Find the leaf of a cluster, to get the final cluster assignment.In contrast to
insert(elki.data.NumberVector), this does not modify the tree.TODO: allow "outliers"?
- Parameters:
node- Current nodenv- Object data- Returns:
- Leaf this vector should be assigned to
-
split
private CFTree.TreeNode split(CFTree.TreeNode node, ClusteringFeature newchild)
Split an overfull node.- Parameters:
node- Node to splitnewchild- Additional child- Returns:
- New sibling of
node
-
insert
private CFTree.TreeNode insert(CFTree.TreeNode node, ClusteringFeature nleaf)
Recursive insertion.- Parameters:
node- Current nodenleaf- Leaf entry to add.- Returns:
- New sibling, if the node was split.
-
add
private boolean add(ClusteringFeature[] children, ClusteringFeature child)
Add a node to the first unused slot.- Parameters:
children- Children listchild- Child to add- Returns:
falseif node is full
-
leafIterator
public CFTree.LeafIterator leafIterator()
Get an iterator over the leaf nodes.- Returns:
- Leaf node iterator.
-
printDebug
protected java.lang.StringBuilder printDebug(java.lang.StringBuilder buf, ClusteringFeature n, int d)Utility function for debugging.- Parameters:
buf- Output buffern- Current noded- Depth- Returns:
- Output buffer
-
-