Class SLINKHDBSCANLinearMemory<O>

  • All Implemented Interfaces:
    Algorithm, HierarchicalClusteringAlgorithm

    @Reference(authors="R. J. G. B. Campello, D. Moulavi, J. Sander",
               title="Density-Based Clustering Based on Hierarchical Density Estimates",
               booktitle="Pacific-Asia Conf. Advances in Knowledge Discovery and Data Mining (PAKDD)",
               url="https://doi.org/10.1007/978-3-642-37456-2_14",
               bibkey="DBLP:conf/pakdd/CampelloMS13")
    public class SLINKHDBSCANLinearMemory<O>
    extends AbstractHDBSCAN<O>
    implements HierarchicalClusteringAlgorithm
    Linear memory implementation of HDBSCAN clustering based on SLINK.

    By not building a distance matrix, we can reduce memory usage to linear memory only; but at the cost of roughly double the runtime (unless using indexes) as we first need to compute all kNN distances (for core sizes), then recompute distances when building the spanning tree.

    This version uses the SLINK algorithm to directly produce the pointer representation expected by the extraction methods. The SLINK algorithm is closely related to Prim's minimum spanning tree, but produces the more compact pointer representation instead of an edges list.

    This implementation does not include the cluster extraction discussed as Step 4. This functionality should however already be provided by HDBSCANHierarchyExtraction. For this reason, we also do not include self-edges.

    Reference:

    R. J. G. B. Campello, D. Moulavi, J. Sander
    Density-Based Clustering Based on Hierarchical Density Estimates
    Pacific-Asia Conf. Advances in Knowledge Discovery and Data Mining (PAKDD)

    Since:
    0.7.0
    Author:
    Erich Schubert
    • Field Detail

      • LOG

        private static final Logging LOG
        Class logger.
    • Constructor Detail

      • SLINKHDBSCANLinearMemory

        public SLINKHDBSCANLinearMemory​(Distance<? super O> distance,
                                        int minPts)
        Constructor.
        Parameters:
        distance - Distance function
        minPts - Minimum number of points for coredists