Class CLIQUE

  • All Implemented Interfaces:
    Algorithm, ClusteringAlgorithm<Clustering<SubspaceModel>>, SubspaceClusteringAlgorithm<SubspaceModel>

    @Title("CLIQUE: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications")
    @Description("Grid-based algorithm to identify dense clusters in subspaces of maximum dimensionality.")
    @Reference(authors="R. Agrawal, J. Gehrke, D. Gunopulos, P. Raghavan",
               title="Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications",
               booktitle="Proc. SIGMOD Conference, Seattle, WA, 1998",
               url="https://doi.org/10.1145/276304.276314",
               bibkey="DBLP:conf/sigmod/AgrawalGGR98")
    public class CLIQUE
    extends java.lang.Object
    implements SubspaceClusteringAlgorithm<SubspaceModel>
    Implementation of the CLIQUE algorithm, a grid-based algorithm to identify dense clusters in subspaces of maximum dimensionality.

    The implementation consists of two steps:
    1. Identification of subspaces that contain clusters
    2. Identification of clusters

    The third step of the original algorithm (Generation of minimal description for the clusters) is not (yet) implemented.

    Note: this is fairly old code, and not well optimized. Do not use this for runtime benchmarking!

    Reference:

    R. Agrawal, J. Gehrke, D. Gunopulos, P. Raghavan
    Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications
    In Proc. ACM SIGMOD Int. Conf. on Management of Data, Seattle, WA, 1998.

    Since:
    0.1
    Author:
    Elke Achtert
    • Field Detail

      • LOG

        private static final Logging LOG
        The logger for this class.
      • xsi

        protected int xsi
        Number of intervals in each dimension.
      • tau

        protected double tau
        Density threshold / selectivity.
      • prune

        protected boolean prune
        Pruning flag.
    • Constructor Detail

      • CLIQUE

        public CLIQUE​(int xsi,
                      double tau,
                      boolean prune)
        Constructor.
        Parameters:
        xsi - Xsi interval number
        tau - Tau density threshold
        prune - Prune flag
    • Method Detail

      • run

        public Clustering<SubspaceModel> run​(Relation<? extends NumberVector> relation)
        Performs the CLIQUE algorithm on the given database.
        Parameters:
        relation - Data relation to process
        Returns:
        Clustering result
      • determineClusters

        private java.util.List<Pair<Subspace,​ModifiableDBIDs>> determineClusters​(java.util.List<CLIQUESubspace> denseSubspaces)
        Determines the clusters in the specified dense subspaces.
        Parameters:
        denseSubspaces - the dense subspaces in reverse order by their coverage
        Returns:
        the clusters in the specified dense subspaces and the corresponding cluster models
      • findOneDimensionalDenseSubspaces

        private java.util.List<CLIQUESubspace> findOneDimensionalDenseSubspaces​(Relation<? extends NumberVector> database)
        Determines the one dimensional dense subspaces and performs a pruning if this option is chosen.
        Parameters:
        database - the database to run the algorithm on
        Returns:
        the one dimensional dense subspaces reverse ordered by their coverage
      • findDenseSubspaces

        private java.util.List<CLIQUESubspace> findDenseSubspaces​(Relation<? extends NumberVector> database,
                                                                  java.util.List<CLIQUESubspace> denseSubspaces)
        Determines the k-dimensional dense subspaces and performs a pruning if this option is chosen.
        Parameters:
        database - the database to run the algorithm on
        denseSubspaces - the (k-1)-dimensional dense subspaces
        Returns:
        a list of the k-dimensional dense subspaces sorted in reverse order by their coverage
      • initOneDimensionalUnits

        private java.util.Collection<CLIQUEUnit> initOneDimensionalUnits​(Relation<? extends NumberVector> database)
        Initializes and returns the one dimensional units.
        Parameters:
        database - the database to run the algorithm on
        Returns:
        the created one dimensional units
      • updateMinMax

        private void updateMinMax​(NumberVector featureVector,
                                  double[] minima,
                                  double[] maxima)
        Updates the minima and maxima array according to the specified feature vector.
        Parameters:
        featureVector - the feature vector
        minima - the array of minima
        maxima - the array of maxima
      • findOneDimensionalDenseSubspaceCandidates

        private java.util.List<CLIQUESubspace> findOneDimensionalDenseSubspaceCandidates​(Relation<? extends NumberVector> database)
        Determines the one-dimensional dense subspace candidates by making a pass over the database.
        Parameters:
        database - the database to run the algorithm on
        Returns:
        the one-dimensional dense subspace candidates reverse ordered by their coverage
      • findDenseSubspaceCandidates

        private java.util.List<CLIQUESubspace> findDenseSubspaceCandidates​(Relation<? extends NumberVector> database,
                                                                           java.util.List<CLIQUESubspace> denseSubspaces)
        Determines the k-dimensional dense subspace candidates from the specified (k-1)-dimensional dense subspaces.
        Parameters:
        database - the database to run the algorithm on
        denseSubspaces - the (k-1)-dimensional dense subspaces
        Returns:
        a list of the k-dimensional dense subspace candidates reverse ordered by their coverage
      • pruneDenseSubspaces

        private java.util.List<CLIQUESubspace> pruneDenseSubspaces​(java.util.List<CLIQUESubspace> denseSubspaces)
        Performs a MDL-based pruning of the specified dense subspaces as described in the CLIQUE algorithm.
        Parameters:
        denseSubspaces - the subspaces to be pruned sorted in reverse order by their coverage
        Returns:
        the subspaces which are not pruned reverse ordered by their coverage
      • computeMeans

        private int[][] computeMeans​(java.util.List<CLIQUESubspace> denseSubspaces)
        The specified sorted list of dense subspaces is divided into the selected set I and the pruned set P. For each set the mean of the cover fractions is computed.
        Parameters:
        denseSubspaces - the dense subspaces in reverse order by their coverage
        Returns:
        the mean of the cover fractions, the first value is the mean of the selected set I, the second value is the mean of the pruned set P.
      • computeDiffs

        private double[][] computeDiffs​(java.util.List<CLIQUESubspace> denseSubspaces,
                                        int[] mi,
                                        int[] mp)
        The specified sorted list of dense subspaces is divided into the selected set I and the pruned set P. For each set the difference from the specified mean values is computed.
        Parameters:
        denseSubspaces - denseSubspaces the dense subspaces in reverse order by their coverage
        mi - the mean of the selected sets I
        mp - the mean of the pruned sets P
        Returns:
        the difference from the specified mean values, the first value is the difference from the mean of the selected set I, the second value is the difference from the mean of the pruned set P.
      • log2OrZero

        private static double log2OrZero​(double x)
        Robust log 2, that ignores zero values.
        Parameters:
        x - Input value
        Returns:
        Log2(x), or zero.
      • getInputTypeRestriction

        public TypeInformation[] getInputTypeRestriction()
        Description copied from interface: Algorithm
        Get the input type restriction used for negotiating the data query.
        Specified by:
        getInputTypeRestriction in interface Algorithm
        Returns:
        Type restriction