Class FastDOC

  • All Implemented Interfaces:
    Algorithm, ClusteringAlgorithm<Clustering<SubspaceModel>>, SubspaceClusteringAlgorithm<SubspaceModel>

    @Title("FastDOC: Density-based Optimal projective Clustering")
    @Reference(authors="C. M. Procopiuc, M. Jones, P. K. Agarwal, T. M. Murali",
               title="A Monte Carlo algorithm for fast projective clustering",
               booktitle="Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD \'02)",
               url="https://doi.org/10.1145/564691.564739",
               bibkey="DBLP:conf/sigmod/ProcopiucJAM02")
    public class FastDOC
    extends DOC
    The heuristic variant of the DOC algorithm, FastDOC

    Reference:

    C. M. Procopiuc, M. Jones, P. K. Agarwal, T. M. Murali
    A Monte Carlo algorithm for fast projective clustering
    In: Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD '02).

    Since:
    0.7.5
    Author:
    Florian Nuecke
    • Constructor Detail

      • FastDOC

        public FastDOC​(double alpha,
                       double beta,
                       double w,
                       int d_zero,
                       RandomFactory random)
        Constructor.
        Parameters:
        alpha - α relative density threshold.
        beta - β balancing parameter for size vs. dimensionality.
        w - half width parameter.
        random - Random factory
    • Method Detail

      • runDOC

        protected Cluster<SubspaceModel> runDOC​(Relation<? extends NumberVector> relation,
                                                ArrayModifiableDBIDs S,
                                                int d,
                                                int n,
                                                int m,
                                                int r,
                                                int minClusterSize)
        Performs a single run of FastDOC, finding a single cluster.
        Overrides:
        runDOC in class DOC
        Parameters:
        relation - used to get actual values for DBIDs.
        S - The set of points we're working on.
        d - Dimensionality of the data set we're currently working on.
        r - Size of random samples.
        m - Number of inner iterations (per seed point).
        n - Number of outer iterations (seed points).
        minClusterSize - Minimum size a cluster must have to be accepted.
        Returns:
        a cluster, if one is found, else null.