java.lang.Object
- elki.clustering.subspace.DOC

All Implemented Interfaces:: Algorithm, ClusteringAlgorithm<Clustering<SubspaceModel>>, SubspaceClusteringAlgorithm<SubspaceModel>

Direct Known Subclasses:: FastDOC

@Title("DOC: Density-based Optimal projective Clustering")
@Reference(authors="C. M. Procopiuc, M. Jones, P. K. Agarwal, T. M. Murali",
           title="A Monte Carlo algorithm for fast projective clustering",
           booktitle="Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD \'02)",
           url="https://doi.org/10.1145/564691.564739",
           bibkey="DBLP:conf/sigmod/ProcopiucJAM02")
public class DOC
extends java.lang.Object
implements SubspaceClusteringAlgorithm<SubspaceModel>

DOC is a sampling based subspace clustering algorithm.

Reference:

C. M. Procopiuc, M. Jones, P. K. Agarwal, T. M. Murali
A Monte Carlo algorithm for fast projective clustering
In: Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD '02).

Since:: 0.6.0
Author:: Florian Nuecke

Nested Class Summary

Nested Classes
Modifier and Type Class Description

static class DOC.Par
Parameterization class.
- Nested classes/interfaces inherited from interface elki.Algorithm
  Algorithm.Utils

Field Summary

Fields
Modifier and Type	Field	Description
`protected double`	`alpha`	Relative density threshold parameter alpha.
`protected double`	`beta`	Balancing parameter for importance of points vs. dimensions
`private static Logging`	`LOG`	The logger for this class.
`protected RandomFactory`	`rnd`	Randomizer used internally for sampling points.
`protected double`	`w`	Half width parameter.

Constructor Summary

Constructors
Constructor Description

DOC(double alpha, double beta, double w, RandomFactory random)
Constructor.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`protected double`	`computeClusterQuality(int clusterSize, int numRelevantDimensions)`	Computes the quality of a cluster based on its size and number of relevant attributes, as described via the μ-function from the paper.
`protected boolean`	`dimensionIsRelevant(int dimension, Relation<? extends NumberVector> relation, DBIDs points)`	Utility method to test if a given dimension is relevant as determined via a set of reference points (i.e. if the variance along the attribute is lower than the threshold).
`protected DBIDs`	`findNeighbors(DBIDRef q, long[] nD, ArrayModifiableDBIDs S, Relation<? extends NumberVector> relation)`	Find the neighbors of point q in the given subspace
`TypeInformation[]`	`getInputTypeRestriction()`	Get the input type restriction used for negotiating the data query.
`protected Cluster<SubspaceModel>`	`makeCluster(Relation<? extends NumberVector> relation, DBIDs C, long[] D)`	Utility method to create a subspace cluster from a list of DBIDs and the relevant attributes.
`Clustering<SubspaceModel>`	`run(Relation<? extends NumberVector> relation)`	Performs the DOC or FastDOC (as configured) algorithm.
`protected Cluster<SubspaceModel>`	`runDOC(Relation<? extends NumberVector> relation, ArrayModifiableDBIDs S, int d, int n, int m, int r, int minClusterSize)`	Performs a single run of DOC, finding a single cluster.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface elki.clustering.ClusteringAlgorithm
autorun

- Field Detail
  - LOG
```
private static final Logging LOG
```
    The logger for this class.
  - alpha
```
protected double alpha
```
    Relative density threshold parameter alpha.
  - beta
```
protected double beta
```
    Balancing parameter for importance of points vs. dimensions
  - w
```
protected double w
```
    Half width parameter.
  - rnd
```
protected RandomFactory rnd
```
    Randomizer used internally for sampling points.
- Constructor Detail
  - DOC
```
public DOC(double alpha,
           double beta,
           double w,
           RandomFactory random)
```
    Constructor.
    
    Parameters:
    
    alpha - α relative density threshold.
    
    beta - β balancing parameter for size vs. dimensionality.
    
    w - half width parameter.
    
    random - Random factory
- Method Detail
  - getInputTypeRestriction
```
public TypeInformation[] getInputTypeRestriction()
```
    Description copied from interface: Algorithm
    
    Get the input type restriction used for negotiating the data query.
    
    Specified by:
    
    getInputTypeRestriction in interface Algorithm
    
    Returns:
    
    Type restriction
  - run
```
public Clustering<SubspaceModel> run(Relation<? extends NumberVector> relation)
```
    Performs the DOC or FastDOC (as configured) algorithm.
    This will run exhaustively, i.e. run DOC until no clusters are found anymore / the database size has shrunk below the threshold for minimum cluster size.
    
    Parameters:
    
    relation - Data relation
  - runDOC
```
protected Cluster<SubspaceModel> runDOC(Relation<? extends NumberVector> relation,
                                        ArrayModifiableDBIDs S,
                                        int d,
                                        int n,
                                        int m,
                                        int r,
                                        int minClusterSize)
```
    Performs a single run of DOC, finding a single cluster.
    
    Parameters:
    
    relation - used to get actual values for DBIDs.
    
    S - The set of points we're working on.
    
    d - Dimensionality of the data set we're currently working on.
    
    r - Size of random samples.
    
    m - Number of inner iterations (per seed point).
    
    n - Number of outer iterations (seed points).
    
    minClusterSize - Minimum size a cluster must have to be accepted.
    
    Returns:
    
    a cluster, if one is found, else null.
  - findNeighbors
```
protected DBIDs findNeighbors(DBIDRef q,
                              long[] nD,
                              ArrayModifiableDBIDs S,
                              Relation<? extends NumberVector> relation)
```
    Find the neighbors of point q in the given subspace
    
    Parameters:
    
    q - Query point
    
    nD - Subspace mask
    
    S - Remaining data points
    
    relation - Data relation
    
    Returns:
    
    Neighbors
  - dimensionIsRelevant
```
protected boolean dimensionIsRelevant(int dimension,
                                      Relation<? extends NumberVector> relation,
                                      DBIDs points)
```
    Utility method to test if a given dimension is relevant as determined via a set of reference points (i.e. if the variance along the attribute is lower than the threshold).
    
    Parameters:
    
    dimension - the dimension to test.
    
    relation - used to get actual values for DBIDs.
    
    points - the points to test.
    
    Returns:
    
    true if the dimension is relevant.
  - makeCluster
```
protected Cluster<SubspaceModel> makeCluster(Relation<? extends NumberVector> relation,
                                             DBIDs C,
                                             long[] D)
```
    Utility method to create a subspace cluster from a list of DBIDs and the relevant attributes.
    
    Parameters:
    
    relation - to compute a centroid.
    
    C - the cluster points.
    
    D - the relevant dimensions.
    
    Returns:
    
    an object representing the subspace cluster.
  - computeClusterQuality
```
protected double computeClusterQuality(int clusterSize,
                                       int numRelevantDimensions)
```
    Computes the quality of a cluster based on its size and number of relevant attributes, as described via the μ-function from the paper.
    
    Parameters:
    
    clusterSize - the size of the cluster.
    
    numRelevantDimensions - the number of dimensions relevant to the cluster.
    
    Returns:
    
    a quality measure (only use this to compare the quality to that other clusters).

Class DOC

Nested Class Summary

Nested classes/interfaces inherited from interface elki.Algorithm

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface elki.clustering.ClusteringAlgorithm

Field Detail

LOG

alpha

beta

w

rnd

Constructor Detail

DOC

Method Detail

getInputTypeRestriction

run

runDOC

findNeighbors

dimensionIsRelevant

makeCluster

computeClusterQuality