@Reference(authors="R. J. G. B. Campello, D. Moulavi, J. Sander", title="Density-Based Clustering Based on Hierarchical Density Estimates", booktitle="Pacific-Asia Conf. Advances in Knowledge Discovery and Data Mining (PAKDD)", url="https://doi.org/10.1007/978-3-642-37456-2_14", bibkey="DBLP:conf/pakdd/CampelloMS13") public class SLINKHDBSCANLinearMemory<O> extends AbstractHDBSCAN<O,PointerDensityHierarchyRepresentationResult> implements HierarchicalClusteringAlgorithm
By not building a distance matrix, we can reduce memory usage to linear memory only; but at the cost of roughly double the runtime (unless using indexes) as we first need to compute all kNN distances (for core sizes), then recompute distances when building the spanning tree.
This version uses the SLINK algorithm to directly produce the pointer representation expected by the extraction methods. The SLINK algorithm is closely related to Prim's minimum spanning tree, but produces the more compact pointer representation instead of an edges list.
 This implementation does not include the cluster extraction
 discussed as Step 4. This functionality should however already be provided by
 HDBSCANHierarchyExtraction
 . For this reason, we also do not include self-edges.
 
Reference:
 R. J. G. B. Campello, D. Moulavi, J. Sander
 Density-Based Clustering Based on Hierarchical Density Estimates
 Pacific-Asia Conf. Advances in Knowledge Discovery and Data Mining (PAKDD)
| Modifier and Type | Class and Description | 
|---|---|
static class  | 
SLINKHDBSCANLinearMemory.Parameterizer<O>
Parameterization class 
 | 
AbstractHDBSCAN.HDBSCANAdapter, AbstractHDBSCAN.HeapMSTCollector| Modifier and Type | Field and Description | 
|---|---|
private static Logging | 
LOG
Class logger. 
 | 
minPtsALGORITHM_IDDISTANCE_FUNCTION_ID| Constructor and Description | 
|---|
SLINKHDBSCANLinearMemory(DistanceFunction<? super O> distanceFunction,
                        int minPts)
Constructor. 
 | 
| Modifier and Type | Method and Description | 
|---|---|
TypeInformation[] | 
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query. 
 | 
protected Logging | 
getLogger()
Get the (STATIC) logger for this class. 
 | 
PointerDensityHierarchyRepresentationResult | 
run(Database db,
   Relation<O> relation)
Run the algorithm 
 | 
private void | 
step1(DBIDRef id,
     WritableDBIDDataStore pi,
     WritableDoubleDataStore lambda)
First step: Initialize P(id) = id, L(id) = infinity. 
 | 
private void | 
step2(DBIDRef id,
     DBIDs processedIDs,
     DistanceQuery<? super O> distQuery,
     DoubleDataStore coredists,
     WritableDoubleDataStore m)
Second step: Determine the pairwise distances from all objects in the
 pointer representation to the new object with the specified id. 
 | 
private void | 
step3(DBIDRef id,
     WritableDBIDDataStore pi,
     WritableDoubleDataStore lambda,
     DBIDs processedIDs,
     WritableDoubleDataStore m)
Third step: Determine the values for P and L 
 | 
private void | 
step4(DBIDRef id,
     WritableDBIDDataStore pi,
     WritableDoubleDataStore lambda,
     DBIDs processedIDs)
Fourth step: Actualize the clusters if necessary 
 | 
computeCoreDists, convertToPointerRepresentationgetDistanceFunctionrunclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitrunprivate static final Logging LOG
public SLINKHDBSCANLinearMemory(DistanceFunction<? super O> distanceFunction, int minPts)
distanceFunction - Distance functionminPts - Minimum number of points for densitypublic PointerDensityHierarchyRepresentationResult run(Database db, Relation<O> relation)
db - Databaserelation - Relationprivate void step1(DBIDRef id, WritableDBIDDataStore pi, WritableDoubleDataStore lambda)
id - the id of the object to be inserted into the pointer
        representationpi - Pi data storelambda - Lambda data storeprivate void step2(DBIDRef id, DBIDs processedIDs, DistanceQuery<? super O> distQuery, DoubleDataStore coredists, WritableDoubleDataStore m)
id - the id of the object to be inserted into the pointer
        representationprocessedIDs - the already processed idsdistQuery - Distance querym - Data storeprivate void step3(DBIDRef id, WritableDBIDDataStore pi, WritableDoubleDataStore lambda, DBIDs processedIDs, WritableDoubleDataStore m)
id - the id of the object to be inserted into the pointer
        representationpi - Pi data storelambda - Lambda data storeprocessedIDs - the already processed idsm - Data storeprivate void step4(DBIDRef id, WritableDBIDDataStore pi, WritableDoubleDataStore lambda, DBIDs processedIDs)
id - the id of the current objectpi - Pi data storelambda - Lambda data storeprocessedIDs - the already processed idspublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithmgetInputTypeRestriction in interface AlgorithmgetInputTypeRestriction in class AbstractHDBSCAN<O,PointerDensityHierarchyRepresentationResult>protected Logging getLogger()
AbstractAlgorithmgetLogger in class AbstractAlgorithm<PointerDensityHierarchyRepresentationResult>Copyright © 2019 ELKI Development Team. License information.