Package elki.clustering.correlation
Class ORCLUS
- java.lang.Object
-
- elki.clustering.AbstractProjectedClustering<Clustering<Model>>
-
- elki.clustering.correlation.ORCLUS
-
- All Implemented Interfaces:
Algorithm
,ClusteringAlgorithm<Clustering<Model>>
@Title("ORCLUS: Arbitrarily ORiented projected CLUSter generation") @Description("Algorithm to find correlation clusters in high dimensional spaces.") @Reference(authors="C. C. Aggarwal, P. S. Yu", title="Finding Generalized Projected Clusters in High Dimensional Spaces", booktitle="Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD \'00)", url="https://doi.org/10.1145/342009.335383", bibkey="DBLP:conf/sigmod/AggarwalY00") public class ORCLUS extends AbstractProjectedClustering<Clustering<Model>>
ORCLUS: Arbitrarily ORiented projected CLUSter generation.Reference:
C. C. Aggarwal, P. S. Yu
Finding Generalized Projected Clusters in High Dimensional Spaces
Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD '00).- Since:
- 0.1
- Author:
- Elke Achtert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private static class
ORCLUS.ORCLUSCluster
Encapsulates the attributes of a cluster.static class
ORCLUS.Par
Parameterization class.private static class
ORCLUS.ProjectedEnergy
Encapsulates the projected energy for a cluster.-
Nested classes/interfaces inherited from interface elki.Algorithm
Algorithm.Utils
-
-
Field Summary
Fields Modifier and Type Field Description private double
alpha
Holds the value ofORCLUS.Par.ALPHA_ID
.private static Logging
LOG
The logger for this class.private PCARunner
pca
The PCA utility object.private RandomFactory
rnd
Random generator-
Fields inherited from class elki.clustering.AbstractProjectedClustering
k, k_i, l
-
-
Constructor Summary
Constructors Constructor Description ORCLUS(int k, int k_i, int l, double alpha, RandomFactory rnd, PCARunner pca)
Java constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private void
assign(Relation<? extends NumberVector> database, java.util.List<ORCLUS.ORCLUSCluster> clusters)
Creates a partitioning of the database by assigning each object to its closest seed.private double[][]
findBasis(Relation<? extends NumberVector> database, ORCLUS.ORCLUSCluster cluster, int dim)
Finds the basis of the subspace of dimensionalitydim
for the specified cluster.TypeInformation[]
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.private java.util.List<ORCLUS.ORCLUSCluster>
initialSeeds(Relation<? extends NumberVector> database, int k)
Initializes the list of seeds wit a random sample of size k.private void
merge(Relation<? extends NumberVector> relation, java.util.List<ORCLUS.ORCLUSCluster> clusters, int k_new, int d_new, IndefiniteProgress cprogress)
Reduces the number of seeds to k_newprivate ORCLUS.ProjectedEnergy
projectedEnergy(Relation<? extends NumberVector> relation, ORCLUS.ORCLUSCluster c_i, ORCLUS.ORCLUSCluster c_j, int i, int j, int dim)
Computes the projected energy of the specified clusters.Clustering<Model>
run(Relation<? extends NumberVector> relation)
Performs the ORCLUS algorithm on the given database.private ORCLUS.ORCLUSCluster
union(Relation<? extends NumberVector> relation, ORCLUS.ORCLUSCluster c1, ORCLUS.ORCLUSCluster c2, int dim)
Returns the union of the two specified clusters.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface elki.clustering.ClusteringAlgorithm
autorun
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
The logger for this class.
-
alpha
private double alpha
Holds the value ofORCLUS.Par.ALPHA_ID
.
-
rnd
private RandomFactory rnd
Random generator
-
pca
private PCARunner pca
The PCA utility object.
-
-
Constructor Detail
-
ORCLUS
public ORCLUS(int k, int k_i, int l, double alpha, RandomFactory rnd, PCARunner pca)
Java constructor.- Parameters:
k
- k Parameterk_i
- k_i Parameterl
- l Parameteralpha
- Alpha Parameterrnd
- Random generatorpca
- PCA runner
-
-
Method Detail
-
getInputTypeRestriction
public TypeInformation[] getInputTypeRestriction()
Description copied from interface:Algorithm
Get the input type restriction used for negotiating the data query.- Returns:
- Type restriction
-
run
public Clustering<Model> run(Relation<? extends NumberVector> relation)
Performs the ORCLUS algorithm on the given database.- Parameters:
relation
- Relation
-
initialSeeds
private java.util.List<ORCLUS.ORCLUSCluster> initialSeeds(Relation<? extends NumberVector> database, int k)
Initializes the list of seeds wit a random sample of size k.- Parameters:
database
- the database holding the objectsk
- the size of the random sample- Returns:
- the initial seed list
-
assign
private void assign(Relation<? extends NumberVector> database, java.util.List<ORCLUS.ORCLUSCluster> clusters)
Creates a partitioning of the database by assigning each object to its closest seed.- Parameters:
database
- the database holding the objectsclusters
- the array of clusters to which the objects should be assigned to
-
findBasis
private double[][] findBasis(Relation<? extends NumberVector> database, ORCLUS.ORCLUSCluster cluster, int dim)
Finds the basis of the subspace of dimensionalitydim
for the specified cluster.- Parameters:
database
- the database to run the algorithm oncluster
- the clusterdim
- the dimensionality of the subspace- Returns:
- matrix defining the basis of the subspace for the specified cluster
-
merge
private void merge(Relation<? extends NumberVector> relation, java.util.List<ORCLUS.ORCLUSCluster> clusters, int k_new, int d_new, IndefiniteProgress cprogress)
Reduces the number of seeds to k_new- Parameters:
relation
- the database holding the objectsclusters
- the set of current seedsk_new
- the new number of seedsd_new
- the new dimensionality of the subspaces for each seed
-
projectedEnergy
private ORCLUS.ProjectedEnergy projectedEnergy(Relation<? extends NumberVector> relation, ORCLUS.ORCLUSCluster c_i, ORCLUS.ORCLUSCluster c_j, int i, int j, int dim)
Computes the projected energy of the specified clusters. The projected energy is given by the mean square distance of the points to the centroid of the union cluster c, when all points in c are projected to the subspace of c.- Parameters:
relation
- the relation holding the objectsc_i
- the first clusterc_j
- the second clusteri
- the index of cluster c_i in the cluster listj
- the index of cluster c_j in the cluster listdim
- the dimensionality of the clusters- Returns:
- the projected energy of the specified cluster
-
union
private ORCLUS.ORCLUSCluster union(Relation<? extends NumberVector> relation, ORCLUS.ORCLUSCluster c1, ORCLUS.ORCLUSCluster c2, int dim)
Returns the union of the two specified clusters.- Parameters:
relation
- the database holding the objectsc1
- the first clusterc2
- the second clusterdim
- the dimensionality of the union cluster- Returns:
- the union of the two specified clusters
-
-