V - the type of NumberVector handled by this Algorithm.@Title(value="DOC: Density-based Optimal projective Clustering") @Reference(authors="C. M. Procopiuc, M. Jones, P. K. Agarwal, T. M. Murali", title="A Monte Carlo algorithm for fast projective clustering", booktitle="Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD \'02)", url="https://doi.org/10.1145/564691.564739", bibkey="DBLP:conf/sigmod/ProcopiucJAM02") public class DOC<V extends NumberVector> extends AbstractAlgorithm<Clustering<SubspaceModel>> implements SubspaceClusteringAlgorithm<SubspaceModel>
Reference:
 C. M. Procopiuc, M. Jones, P. K. Agarwal, T. M. Murali
 A Monte Carlo algorithm for fast projective clustering
 In: Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD '02).
| Modifier and Type | Class and Description | 
|---|---|
static class  | 
DOC.Parameterizer<V extends NumberVector>
Parameterization class. 
 | 
| Modifier and Type | Field and Description | 
|---|---|
protected double | 
alpha
Relative density threshold parameter alpha. 
 | 
protected double | 
beta
Balancing parameter for importance of points vs. dimensions 
 | 
private static Logging | 
LOG
The logger for this class. 
 | 
protected RandomFactory | 
rnd
Randomizer used internally for sampling points. 
 | 
protected double | 
w
Half width parameter. 
 | 
ALGORITHM_ID| Constructor and Description | 
|---|
DOC(double alpha,
   double beta,
   double w,
   RandomFactory random)
Constructor. 
 | 
| Modifier and Type | Method and Description | 
|---|---|
protected double | 
computeClusterQuality(int clusterSize,
                     int numRelevantDimensions)
Computes the quality of a cluster based on its size and number of relevant
 attributes, as described via the μ-function from the paper. 
 | 
protected boolean | 
dimensionIsRelevant(int dimension,
                   Relation<V> relation,
                   DBIDs points)
Utility method to test if a given dimension is relevant as determined via a
 set of reference points (i.e. if the variance along the attribute is lower
 than the threshold). 
 | 
protected DBIDs | 
findNeighbors(DBIDRef q,
             long[] nD,
             ArrayModifiableDBIDs S,
             Relation<V> relation)
Find the neighbors of point q in the given subspace 
 | 
TypeInformation[] | 
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query. 
 | 
protected Logging | 
getLogger()
Get the (STATIC) logger for this class. 
 | 
protected Cluster<SubspaceModel> | 
makeCluster(Relation<V> relation,
           DBIDs C,
           long[] D)
Utility method to create a subspace cluster from a list of DBIDs and the
 relevant attributes. 
 | 
Clustering<SubspaceModel> | 
run(Database database,
   Relation<V> relation)
Performs the DOC or FastDOC (as configured) algorithm on the given
 Database. 
 | 
protected Cluster<SubspaceModel> | 
runDOC(Database database,
      Relation<V> relation,
      ArrayModifiableDBIDs S,
      int d,
      int n,
      int m,
      int r,
      int minClusterSize)
Performs a single run of DOC, finding a single cluster. 
 | 
runclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitrunprivate static final Logging LOG
protected double alpha
protected double beta
protected double w
protected RandomFactory rnd
public DOC(double alpha,
           double beta,
           double w,
           RandomFactory random)
alpha - α relative density threshold.beta - β balancing parameter for size vs. dimensionality.w - half width parameter.random - Random factorypublic Clustering<SubspaceModel> run(Database database, Relation<V> relation)
database - Databaserelation - Data relationprotected Cluster<SubspaceModel> runDOC(Database database, Relation<V> relation, ArrayModifiableDBIDs S, int d, int n, int m, int r, int minClusterSize)
database - Database contextrelation - used to get actual values for DBIDs.S - The set of points we're working on.d - Dimensionality of the data set we're currently working on.r - Size of random samples.m - Number of inner iterations (per seed point).n - Number of outer iterations (seed points).minClusterSize - Minimum size a cluster must have to be accepted.null.protected DBIDs findNeighbors(DBIDRef q, long[] nD, ArrayModifiableDBIDs S, Relation<V> relation)
q - Query pointnD - Subspace maskS - Remaining data pointsrelation - Data relationprotected boolean dimensionIsRelevant(int dimension,
                                      Relation<V> relation,
                                      DBIDs points)
dimension - the dimension to test.relation - used to get actual values for DBIDs.points - the points to test.true if the dimension is relevant.protected Cluster<SubspaceModel> makeCluster(Relation<V> relation, DBIDs C, long[] D)
relation - to compute a centroid.C - the cluster points.D - the relevant dimensions.protected double computeClusterQuality(int clusterSize,
                                       int numRelevantDimensions)
clusterSize - the size of the cluster.numRelevantDimensions - the number of dimensions relevant to the
        cluster.public TypeInformation[] getInputTypeRestriction()
AbstractAlgorithmgetInputTypeRestriction in interface AlgorithmgetInputTypeRestriction in class AbstractAlgorithm<Clustering<SubspaceModel>>protected Logging getLogger()
AbstractAlgorithmgetLogger in class AbstractAlgorithm<Clustering<SubspaceModel>>Copyright © 2019 ELKI Development Team. License information.