DOC (ELKI: Environment for DeveLoping KDD-Applications Supported by Index-Structures)

java.lang.Object
- de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<Clustering<SubspaceModel>>
- - de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.DOC<V>

Type Parameters:: V - the type of NumberVector handled by this Algorithm.

All Implemented Interfaces:: Algorithm, ClusteringAlgorithm<Clustering<SubspaceModel>>, SubspaceClusteringAlgorithm<SubspaceModel>

Direct Known Subclasses:: FastDOC

@Title(value="DOC: Density-based Optimal projective Clustering")
 @Reference(authors="C. M. Procopiuc, M. Jones, P. K. Agarwal, T. M. Murali",
           title="A Monte Carlo algorithm for fast projective clustering",
           booktitle="Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD \'02)",
           url="https://doi.org/10.1145/564691.564739",
           bibkey="DBLP:conf/sigmod/ProcopiucJAM02")
public class DOC<V extends NumberVector>
extends AbstractAlgorithm<Clustering<SubspaceModel>>
implements SubspaceClusteringAlgorithm<SubspaceModel>

DOC is a sampling based subspace clustering algorithm.

Reference:

C. M. Procopiuc, M. Jones, P. K. Agarwal, T. M. Murali
A Monte Carlo algorithm for fast projective clustering
In: Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD '02).

Since:: 0.6.0
Author:: Florian Nuecke

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class DOC.Parameterizer<V extends NumberVector>
Parameterization class.

Nested Classes
Modifier and Type	Class and Description
`static class`	`DOC.Parameterizer<V extends NumberVector>` Parameterization class.

Field Summary

Fields
Modifier and Type	Field and Description
`protected double`	`alpha` Relative density threshold parameter alpha.
`protected double`	`beta` Balancing parameter for importance of points vs. dimensions
`private static Logging`	`LOG` The logger for this class.
`protected RandomFactory`	`rnd` Randomizer used internally for sampling points.
`protected double`	`w` Half width parameter.

Fields inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm
ALGORITHM_ID

Constructor Summary

Constructors
Constructor and Description

DOC(double alpha, double beta, double w, RandomFactory random)
Constructor.

Constructors
Constructor and Description
`DOC(double alpha, double beta, double w, RandomFactory random)` Constructor.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected double`	`computeClusterQuality(int clusterSize, int numRelevantDimensions)` Computes the quality of a cluster based on its size and number of relevant attributes, as described via the μ-function from the paper.
`protected boolean`	`dimensionIsRelevant(int dimension, Relation<V> relation, DBIDs points)` Utility method to test if a given dimension is relevant as determined via a set of reference points (i.e. if the variance along the attribute is lower than the threshold).
`protected DBIDs`	`findNeighbors(DBIDRef q, long[] nD, ArrayModifiableDBIDs S, Relation<V> relation)` Find the neighbors of point q in the given subspace
`TypeInformation[]`	`getInputTypeRestriction()` Get the input type restriction used for negotiating the data query.
`protected Logging`	`getLogger()` Get the (STATIC) logger for this class.
`protected Cluster<SubspaceModel>`	`makeCluster(Relation<V> relation, DBIDs C, long[] D)` Utility method to create a subspace cluster from a list of DBIDs and the relevant attributes.
`Clustering<SubspaceModel>`	`run(Database database, Relation<V> relation)` Performs the DOC or FastDOC (as configured) algorithm on the given Database.
`protected Cluster<SubspaceModel>`	`runDOC(Database database, Relation<V> relation, ArrayModifiableDBIDs S, int d, int n, int m, int r, int minClusterSize)` Performs a single run of DOC, finding a single cluster.

Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm
run

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.clustering.ClusteringAlgorithm
run

- Field Detail
  - LOG
```
private static final Logging LOG
```
    The logger for this class.
  - alpha
```
protected double alpha
```
    Relative density threshold parameter alpha.
  - beta
```
protected double beta
```
    Balancing parameter for importance of points vs. dimensions
  - w
```
protected double w
```
    Half width parameter.
  - rnd
```
protected RandomFactory rnd
```
    Randomizer used internally for sampling points.
- Constructor Detail
  - DOC
```
public DOC(double alpha,
           double beta,
           double w,
           RandomFactory random)
```
    Constructor.
    
    Parameters:
    
    alpha - α relative density threshold.
    
    beta - β balancing parameter for size vs. dimensionality.
    
    w - half width parameter.
    
    random - Random factory
- Method Detail
  - run
```
public Clustering<SubspaceModel> run(Database database,
                                     Relation<V> relation)
```
    Performs the DOC or FastDOC (as configured) algorithm on the given Database. This will run exhaustively, i.e. run DOC until no clusters are found anymore / the database size has shrunk below the threshold for minimum cluster size.
    
    Parameters:
    
    database - Database
    
    relation - Data relation
  - runDOC
```
protected Cluster<SubspaceModel> runDOC(Database database,
                                        Relation<V> relation,
                                        ArrayModifiableDBIDs S,
                                        int d,
                                        int n,
                                        int m,
                                        int r,
                                        int minClusterSize)
```
    Performs a single run of DOC, finding a single cluster.
    
    Parameters:
    
    database - Database context
    
    relation - used to get actual values for DBIDs.
    
    S - The set of points we're working on.
    
    d - Dimensionality of the data set we're currently working on.
    
    r - Size of random samples.
    
    m - Number of inner iterations (per seed point).
    
    n - Number of outer iterations (seed points).
    
    minClusterSize - Minimum size a cluster must have to be accepted.
    
    Returns:
    
    a cluster, if one is found, else null.
  - findNeighbors
```
protected DBIDs findNeighbors(DBIDRef q,
                              long[] nD,
                              ArrayModifiableDBIDs S,
                              Relation<V> relation)
```
    Find the neighbors of point q in the given subspace
    
    Parameters:
    
    q - Query point
    
    nD - Subspace mask
    
    S - Remaining data points
    
    relation - Data relation
    
    Returns:
    
    Neighbors
  - dimensionIsRelevant
```
protected boolean dimensionIsRelevant(int dimension,
                                      Relation<V> relation,
                                      DBIDs points)
```
    Utility method to test if a given dimension is relevant as determined via a set of reference points (i.e. if the variance along the attribute is lower than the threshold).
    
    Parameters:
    
    dimension - the dimension to test.
    
    relation - used to get actual values for DBIDs.
    
    points - the points to test.
    
    Returns:
    
    true if the dimension is relevant.
  - makeCluster
```
protected Cluster<SubspaceModel> makeCluster(Relation<V> relation,
                                             DBIDs C,
                                             long[] D)
```
    Utility method to create a subspace cluster from a list of DBIDs and the relevant attributes.
    
    Parameters:
    
    relation - to compute a centroid.
    
    C - the cluster points.
    
    D - the relevant dimensions.
    
    Returns:
    
    an object representing the subspace cluster.
  - computeClusterQuality
```
protected double computeClusterQuality(int clusterSize,
                                       int numRelevantDimensions)
```
    Computes the quality of a cluster based on its size and number of relevant attributes, as described via the μ-function from the paper.
    
    Parameters:
    
    clusterSize - the size of the cluster.
    
    numRelevantDimensions - the number of dimensions relevant to the cluster.
    
    Returns:
    
    a quality measure (only use this to compare the quality to that other clusters).
  - getInputTypeRestriction
```
public TypeInformation[] getInputTypeRestriction()
```
    Description copied from class: AbstractAlgorithm
    
    Get the input type restriction used for negotiating the data query.
    
    Specified by:
    
    getInputTypeRestriction in interface Algorithm
    
    Specified by:
    
    getInputTypeRestriction in class AbstractAlgorithm<Clustering<SubspaceModel>>
    
    Returns:
    
    Type restriction
  - getLogger
```
protected Logging getLogger()
```
    Description copied from class: AbstractAlgorithm
    
    Get the (STATIC) logger for this class.
    
    Specified by:
    
    getLogger in class AbstractAlgorithm<Clustering<SubspaceModel>>
    
    Returns:
    
    the static logger

Class DOC<V extends NumberVector>

Nested Class Summary

Field Summary

Fields inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm

Constructor Summary

Method Summary

Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm

Methods inherited from class java.lang.Object

Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.clustering.ClusteringAlgorithm

Field Detail

LOG

alpha

beta

w

rnd

Constructor Detail

DOC

Method Detail

run

runDOC

findNeighbors

dimensionIsRelevant

makeCluster

computeClusterQuality

getInputTypeRestriction

getLogger