Class ParallelGeneralizedDBSCAN

  • All Implemented Interfaces:
    Algorithm, ClusteringAlgorithm<Clustering<Model>>

    @Reference(prefix="closely related",
               authors="M. Patwary, D. Palsetia, A. Agrawal, W. K. Liao, F. Manne, A. Choudhary",
               title="A new scalable parallel DBSCAN algorithm using the disjoint-set data structure",
               booktitle="IEEE Int. Conf. for High Performance Computing, Networking, Storage and Analysis (SC)",
               url="https://doi.org/10.1109/SC.2012.9",
               bibkey="DBLP:conf/sc/PatwaryPALMC12")
    public class ParallelGeneralizedDBSCAN
    extends java.lang.Object
    implements ClusteringAlgorithm<Clustering<Model>>
    Parallel version of DBSCAN clustering.

    This is the archetype of a non-linear shared-memory DBSCAN that does not sequentially expand a cluster, but processes points in arbitrary order and merges clusters when neighboring core points occur.

    Because of synchronization when labeling points, the speedup will only be sublinear in the number of cores. But in particular without an index and on large data, the majority of the work is finding the neighbors; not in labeling the points.

    Reference:

    Please cite the latest ELKI version.

    Related is the following publication, whose "disjoint set data structure" appears to be a similar union-find approach to ours, and whose DSDBSCAN appears rather similar. The main benefit of our approach is that we avoid using the union-find data structure for every object, but only use it for merging clusters.

    M. Patwary, D. Palsetia, A. Agrawal, W. K. Liao, F. Manne, A. Choudhary
    A new scalable parallel DBSCAN algorithm using the disjoint-set data structure
    In IEEE Int. Conf. for High Performance Computing, Networking, Storage and Analysis (SC)

    Since:
    0.7.5
    Author:
    Erich Schubert
    • Field Detail

      • LOG

        private static final Logging LOG
        Get a logger for this algorithm
      • corepred

        protected CorePredicate<?> corepred
        The core predicate factory.
      • coremodel

        protected boolean coremodel
        Track which objects are "core" objects.
    • Constructor Detail

      • ParallelGeneralizedDBSCAN

        public ParallelGeneralizedDBSCAN​(NeighborPredicate<?> npred,
                                         CorePredicate<?> corepred,
                                         boolean coremodel)
        Constructor for parameterized algorithm.
        Parameters:
        npred - Neighbor predicate.
        corepred - Core point predicate.
        coremodel - Keep track of core points.