Class GriDBSCAN.Instance<V extends NumberVector>

  • Type Parameters:
    V - Vector type
    Enclosing class:
    GriDBSCAN<V extends NumberVector>

    protected static class GriDBSCAN.Instance<V extends NumberVector>
    extends java.lang.Object
    Instance, for a single run.
    Author:
    Erich Schubert
    • Field Detail

      • epsilon

        protected double epsilon
        Holds the epsilon radius threshold.
      • minpts

        protected int minpts
        Holds the minimum cluster size.
      • gridwidth

        protected double gridwidth
        Width of the grid cells. Must be at least 2 epsilon!
      • domain

        protected double[][] domain
        Value domain.
      • dim

        protected int dim
        Dimensionality.
      • offset

        protected double[] offset
        Grid offset.
      • cells

        protected int[] cells
        Number of cells per dimension.
      • grid

        it.unimi.dsi.fastutil.longs.Long2ObjectOpenHashMap<ModifiableDBIDs> grid
        Data grid partitioning.
      • cores

        private Core[] cores
        Core identifier objects (shared to conserve memory).
      • borders

        private Border[] borders
        Border identifier objects (shared to conserve memory).
      • overflown

        private boolean overflown
        Indicates that the number of grid cells has overflown.
    • Constructor Detail

      • Instance

        public Instance​(Distance<? super V> distance,
                        double epsilon,
                        int minpts,
                        double gridwidth)
        Constructor.
        Parameters:
        distance - Distance function
        epsilon - Epsilon
        minpts - MinPts
        gridwidth - Grid width
    • Method Detail

      • run

        public Clustering<Model> run​(Relation<V> relation)
        Performs the DBSCAN algorithm on the given database.
        Parameters:
        relation - Relation to process
      • updateCoreBorderObjects

        private void updateCoreBorderObjects​(int clusterid)
        Update the shared arrays for core points (to conserve memory)
        Parameters:
        clusterid - Number of clusters
      • computeGridBaseOffsets

        private long computeGridBaseOffsets​(int size)
        Compute the grid base offset.
        Parameters:
        size - Data set size
        Returns:
        Total number of grid cells
      • buildGrid

        protected void buildGrid​(Relation<V> relation,
                                 int numcells,
                                 double[] offset)
        Build the data grid.
        Parameters:
        relation - Data relation
        numcells - Total number of cells
        offset - Offset
      • insertIntoGrid

        private void insertIntoGrid​(DBIDRef id,
                                    V obj,
                                    int d,
                                    int v)
        Insert a single object into the grid; potentially into multiple cells (at most 2^d) via recursion.
        Parameters:
        id - Object ID
        obj - Object
        d - Current dimension
        v - Current cell value
      • checkGridCellSizes

        protected int checkGridCellSizes​(int size,
                                         long numcell)
        Perform some sanity checks on the grid cells.
        Parameters:
        numcell - Number of cells
        size - Relation size
        Returns:
        Number of cells with minPts points
      • expandCluster

        protected int expandCluster​(DBIDRef seed,
                                    int clusterid,
                                    WritableIntegerDataStore clusterids,
                                    ModifiableDoubleDBIDList neighbors,
                                    ArrayModifiableDBIDs activeSet,
                                    RangeSearcher<DBIDRef> rq,
                                    FiniteProgress pprog)
        Set-based expand cluster implementation.
        Parameters:
        clusterid - ID of the current cluster.
        clusterids - Current object to cluster mapping.
        neighbors - Neighbors acquired by initial getNeighbors call.
        activeSet - Set to manage active candidates.
        rq - Range query
        pprog - Object progress
        Returns:
        cluster size
      • processCorePoint

        protected int processCorePoint​(DBIDRef seed,
                                       DoubleDBIDList newneighbors,
                                       int clusterid,
                                       WritableIntegerDataStore clusterids,
                                       ArrayModifiableDBIDs activeSet)
        Process a single core point.
        Parameters:
        seed - Point to process
        newneighbors - New neighbors
        clusterid - Cluster to add to
        clusterids - Cluster assignment storage.
        activeSet - Active set of cluster seeds
        Returns:
        Number of new points added to cluster
      • buildResult

        protected Clustering<Model> buildResult​(DBIDs ids,
                                                int clusterid)
        Assemble the clustering result.
        Parameters:
        ids - Object IDs
        clusterid - Largest valid cluster number
        Returns:
        Clustering