Class CFTree
- java.lang.Object
-
- elki.clustering.hierarchical.birch.CFTree
-
@Reference(authors="T. Zhang, R. Ramakrishnan, M. Livny",title="BIRCH: An Efficient Data Clustering Method for Very Large Databases",booktitle="Proc. 1996 ACM SIGMOD International Conference on Management of Data",url="https://doi.org/10.1145/233269.233324",bibkey="DBLP:conf/sigmod/ZhangRL96") @Reference(authors="T. Zhang, R. Ramakrishnan, M. Livny",title="BIRCH: A New Data Clustering Algorithm and Its Applications",booktitle="Data Min. Knowl. Discovery",url="https://doi.org/10.1023/A:1009783824328",bibkey="DBLP:journals/datamine/ZhangRL97") public class CFTree extends java.lang.Object
Partial implementation of the CFTree as used by BIRCH.Important differences:
- Leaf nodes and directory nodes have the same capacity
- Condensing and memory limits are not implemented
- Merging refinement (merge-resplit) is not implemented
Condensing and merging refinement are possible, and improvements to this code are welcome - please send a pull request!
References:
T. Zhang, R. Ramakrishnan, M. Livny
BIRCH: An Efficient Data Clustering Method for Very Large Databases Proc. 1996 ACM SIGMOD International Conference on Management of DataT. Zhang, R. Ramakrishnan, M. Livny
BIRCH: A New Data Clustering Algorithm and Its Applications
Data Min. Knowl. Discovery- Since:
- 0.7.5
- Author:
- Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
CFTree.Factory
CF-Tree Factory.static class
CFTree.LeafIterator
Iterator over leaf nodes.
-
Field Summary
Fields Modifier and Type Field Description (package private) BIRCHAbsorptionCriterion
absorption
Criterion for absorbing points.(package private) int
capacity
Capacity of a node.(package private) BIRCHDistance
distance
Distance function to use.(package private) int
leaves
Leaf node counter.static Logging
LOG
Class logger.(package private) CFTree.TreeNode
root
Current root node.(package private) double
thresholdsq
Squared maximum radius threshold of a clustering feature.
-
Constructor Summary
Constructors Constructor Description CFTree(BIRCHDistance distance, BIRCHAbsorptionCriterion absorption, double threshold, int capacity)
Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private boolean
add(ClusteringFeature[] children, ClusteringFeature child)
Add a node to the first unused slot.private double
estimateThreshold(CFTree.TreeNode current)
private ClusteringFeature
findLeaf(CFTree.TreeNode node, NumberVector nv)
Find the leaf of a cluster, to get the final cluster assignment.ClusteringFeature
findLeaf(NumberVector nv)
Find the leaf of a cluster, to get the final cluster assignment.private CFTree.TreeNode
insert(CFTree.TreeNode node, ClusteringFeature nleaf)
Recursive insertion.private CFTree.TreeNode
insert(CFTree.TreeNode node, NumberVector nv)
Recursive insertion.void
insert(NumberVector nv)
Insert a data point into the tree.CFTree.LeafIterator
leafIterator()
Get an iterator over the leaf nodes.protected java.lang.StringBuilder
printDebug(java.lang.StringBuilder buf, ClusteringFeature n, int d)
Utility function for debugging.protected void
rebuildTree()
Rebuild the CFTree to condense it to approximately half the size.private CFTree.TreeNode
split(CFTree.TreeNode node, ClusteringFeature newchild)
Split an overfull node.
-
-
-
Field Detail
-
LOG
public static final Logging LOG
Class logger.
-
distance
BIRCHDistance distance
Distance function to use.
-
absorption
BIRCHAbsorptionCriterion absorption
Criterion for absorbing points.
-
thresholdsq
double thresholdsq
Squared maximum radius threshold of a clustering feature.
-
capacity
int capacity
Capacity of a node.
-
root
CFTree.TreeNode root
Current root node.
-
leaves
int leaves
Leaf node counter.
-
-
Constructor Detail
-
CFTree
public CFTree(BIRCHDistance distance, BIRCHAbsorptionCriterion absorption, double threshold, int capacity)
Constructor.- Parameters:
distance
- Distance function to useabsorption
- Absorption criterionthreshold
- Thresholdcapacity
- Capacity
-
-
Method Detail
-
insert
public void insert(NumberVector nv)
Insert a data point into the tree.- Parameters:
nv
- Object data
-
rebuildTree
protected void rebuildTree()
Rebuild the CFTree to condense it to approximately half the size.
-
estimateThreshold
private double estimateThreshold(CFTree.TreeNode current)
-
insert
private CFTree.TreeNode insert(CFTree.TreeNode node, NumberVector nv)
Recursive insertion.- Parameters:
node
- Current nodenv
- Object data- Returns:
- New sibling, if the node was split.
-
findLeaf
public ClusteringFeature findLeaf(NumberVector nv)
Find the leaf of a cluster, to get the final cluster assignment.In contrast to
insert(elki.data.NumberVector)
, this does not modify the tree.- Parameters:
nv
- Object data- Returns:
- Leaf this vector should be assigned to
-
findLeaf
private ClusteringFeature findLeaf(CFTree.TreeNode node, NumberVector nv)
Find the leaf of a cluster, to get the final cluster assignment.In contrast to
insert(elki.data.NumberVector)
, this does not modify the tree.TODO: allow "outliers"?
- Parameters:
node
- Current nodenv
- Object data- Returns:
- Leaf this vector should be assigned to
-
split
private CFTree.TreeNode split(CFTree.TreeNode node, ClusteringFeature newchild)
Split an overfull node.- Parameters:
node
- Node to splitnewchild
- Additional child- Returns:
- New sibling of
node
-
insert
private CFTree.TreeNode insert(CFTree.TreeNode node, ClusteringFeature nleaf)
Recursive insertion.- Parameters:
node
- Current nodenleaf
- Leaf entry to add.- Returns:
- New sibling, if the node was split.
-
add
private boolean add(ClusteringFeature[] children, ClusteringFeature child)
Add a node to the first unused slot.- Parameters:
children
- Children listchild
- Child to add- Returns:
false
if node is full
-
leafIterator
public CFTree.LeafIterator leafIterator()
Get an iterator over the leaf nodes.- Returns:
- Leaf node iterator.
-
printDebug
protected java.lang.StringBuilder printDebug(java.lang.StringBuilder buf, ClusteringFeature n, int d)
Utility function for debugging.- Parameters:
buf
- Output buffern
- Current noded
- Depth- Returns:
- Output buffer
-
-