Class ORCLUS

  • All Implemented Interfaces:
    Algorithm, ClusteringAlgorithm<Clustering<Model>>

    @Title("ORCLUS: Arbitrarily ORiented projected CLUSter generation")
    @Description("Algorithm to find correlation clusters in high dimensional spaces.")
    @Reference(authors="C. C. Aggarwal, P. S. Yu",
               title="Finding Generalized Projected Clusters in High Dimensional Spaces",
               booktitle="Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD \'00)",
               url="https://doi.org/10.1145/342009.335383",
               bibkey="DBLP:conf/sigmod/AggarwalY00")
    public class ORCLUS
    extends AbstractProjectedClustering<Clustering<Model>>
    ORCLUS: Arbitrarily ORiented projected CLUSter generation.

    Reference:

    C. C. Aggarwal, P. S. Yu
    Finding Generalized Projected Clusters in High Dimensional Spaces
    Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD '00).

    Since:
    0.1
    Author:
    Elke Achtert
    • Constructor Detail

      • ORCLUS

        public ORCLUS​(int k,
                      int k_i,
                      int l,
                      double alpha,
                      RandomFactory rnd,
                      PCARunner pca)
        Java constructor.
        Parameters:
        k - k Parameter
        k_i - k_i Parameter
        l - l Parameter
        alpha - Alpha Parameter
        rnd - Random generator
        pca - PCA runner
    • Method Detail

      • getInputTypeRestriction

        public TypeInformation[] getInputTypeRestriction()
        Description copied from interface: Algorithm
        Get the input type restriction used for negotiating the data query.
        Returns:
        Type restriction
      • initialSeeds

        private java.util.List<ORCLUS.ORCLUSCluster> initialSeeds​(Relation<? extends NumberVector> database,
                                                                  int k)
        Initializes the list of seeds wit a random sample of size k.
        Parameters:
        database - the database holding the objects
        k - the size of the random sample
        Returns:
        the initial seed list
      • assign

        private void assign​(Relation<? extends NumberVector> database,
                            java.util.List<ORCLUS.ORCLUSCluster> clusters)
        Creates a partitioning of the database by assigning each object to its closest seed.
        Parameters:
        database - the database holding the objects
        clusters - the array of clusters to which the objects should be assigned to
      • findBasis

        private double[][] findBasis​(Relation<? extends NumberVector> database,
                                     ORCLUS.ORCLUSCluster cluster,
                                     int dim)
        Finds the basis of the subspace of dimensionality dim for the specified cluster.
        Parameters:
        database - the database to run the algorithm on
        cluster - the cluster
        dim - the dimensionality of the subspace
        Returns:
        matrix defining the basis of the subspace for the specified cluster
      • merge

        private void merge​(Relation<? extends NumberVector> relation,
                           java.util.List<ORCLUS.ORCLUSCluster> clusters,
                           int k_new,
                           int d_new,
                           IndefiniteProgress cprogress)
        Reduces the number of seeds to k_new
        Parameters:
        relation - the database holding the objects
        clusters - the set of current seeds
        k_new - the new number of seeds
        d_new - the new dimensionality of the subspaces for each seed
      • projectedEnergy

        private ORCLUS.ProjectedEnergy projectedEnergy​(Relation<? extends NumberVector> relation,
                                                       ORCLUS.ORCLUSCluster c_i,
                                                       ORCLUS.ORCLUSCluster c_j,
                                                       int i,
                                                       int j,
                                                       int dim)
        Computes the projected energy of the specified clusters. The projected energy is given by the mean square distance of the points to the centroid of the union cluster c, when all points in c are projected to the subspace of c.
        Parameters:
        relation - the relation holding the objects
        c_i - the first cluster
        c_j - the second cluster
        i - the index of cluster c_i in the cluster list
        j - the index of cluster c_j in the cluster list
        dim - the dimensionality of the clusters
        Returns:
        the projected energy of the specified cluster
      • union

        private ORCLUS.ORCLUSCluster union​(Relation<? extends NumberVector> relation,
                                           ORCLUS.ORCLUSCluster c1,
                                           ORCLUS.ORCLUSCluster c2,
                                           int dim)
        Returns the union of the two specified clusters.
        Parameters:
        relation - the database holding the objects
        c1 - the first cluster
        c2 - the second cluster
        dim - the dimensionality of the union cluster
        Returns:
        the union of the two specified clusters