Package elki.clustering.subspace
Class PROCLUS
- java.lang.Object
-
- elki.clustering.AbstractProjectedClustering<Clustering<SubspaceModel>>
-
- elki.clustering.subspace.PROCLUS
-
- All Implemented Interfaces:
Algorithm,ClusteringAlgorithm<Clustering<SubspaceModel>>,SubspaceClusteringAlgorithm<SubspaceModel>
@Title("PROCLUS: PROjected CLUStering") @Description("Algorithm to find subspace clusters in high dimensional spaces.") @Reference(authors="C. C. Aggarwal, C. Procopiuc, J. L. Wolf, P. S. Yu, J. S. Park", title="Fast Algorithms for Projected Clustering", booktitle="Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD \'99)", url="https://doi.org/10.1145/304181.304188", bibkey="doi:10.1145/304181.304188") public class PROCLUS extends AbstractProjectedClustering<Clustering<SubspaceModel>> implements SubspaceClusteringAlgorithm<SubspaceModel>
The PROCLUS algorithm, an algorithm to find subspace clusters in high dimensional spaces.Reference:
C. C. Aggarwal, C. Procopiuc, J. L. Wolf, P. S. Yu, J. S. Park
Fast Algorithms for Projected Clustering
Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD '99).- Since:
- 0.1
- Author:
- Elke Achtert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private static classPROCLUS.DoubleIntIntSimple triple.static classPROCLUS.ParParameterization class.private static classPROCLUS.PROCLUSClusterEncapsulates the attributes of a cluster.-
Nested classes/interfaces inherited from interface elki.Algorithm
Algorithm.Utils
-
-
Field Summary
Fields Modifier and Type Field Description private static LoggingLOGThe logger for this class.private intm_iMultiplier for the initial number of medoids.private RandomFactoryrndRandom generator-
Fields inherited from class elki.clustering.AbstractProjectedClustering
k, k_i, l
-
-
Constructor Summary
Constructors Constructor Description PROCLUS(int k, int k_i, int l, int m_i, RandomFactory rnd)Java constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private java.util.ArrayList<PROCLUS.PROCLUSCluster>assignPoints(ArrayDBIDs m_current, long[][] dimensions, Relation<? extends NumberVector> database)Assigns the objects to the clusters.private doubleavgDistance(double[] centroid, DBIDs objectIDs, Relation<? extends NumberVector> database, int dimension)Computes the average distance of the objects to the centroid along the specified dimension.private DBIDscomputeBadMedoids(ArrayDBIDs m_current, java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters, int threshold)Computes the bad medoids, where the medoid of a cluster with less than the specified threshold of objects is bad.private long[][]computeDimensionMap(java.util.List<PROCLUS.DoubleIntInt> z_ijs, int dim, int numc)Compute the dimension map.private ArrayDBIDscomputeM_current(DBIDs m, DBIDs m_best, DBIDs m_bad, java.util.Random random)Computes the set of medoids in current iteration.private java.util.List<PROCLUS.DoubleIntInt>computeZijs(double[][] averageDistances, int dim)Compute the z_ij values.private doubleevaluateClusters(java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters, long[][] dimensions, Relation<? extends NumberVector> database)Evaluates the quality of the clusters.private java.util.List<PROCLUS.PROCLUSCluster>finalAssignment(java.util.List<Pair<double[],long[]>> dimensions, Relation<? extends NumberVector> database)Refinement step to assign the objects to the final clusters.private long[][]findDimensions(ArrayDBIDs medoids, Relation<? extends NumberVector> relation, DistanceQuery<? extends NumberVector> distance, RangeSearcher<DBIDRef> rangeQuery)Determines the set of correlated dimensions for each medoid in the specified medoid set.private java.util.List<Pair<double[],long[]>>findDimensions(java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters, Relation<? extends NumberVector> database)Refinement step that determines the set of correlated dimensions for each cluster centroid.TypeInformation[]getInputTypeRestriction()Get the input type restriction used for negotiating the data query.private DataStore<DBIDs>getLocalities(DBIDs medoids, DistanceQuery<? extends NumberVector> distance, RangeSearcher<DBIDRef> rangeQuery)Computes the localities of the specified medoids: for each medoid m the objects in the sphere centered at m with radius minDist are determined, where minDist is the minimum distance between medoid m and any other medoid m_i.private ArrayDBIDsgreedy(DistanceQuery<? extends NumberVector> distance, DBIDs sampleSet, int m, java.util.Random random)Returns a piercing set of k medoids from the specified sample set.private ArrayDBIDsinitialSet(DBIDs sampleSet, int k, java.util.Random random)Returns a set of k elements from the specified sample set.private doublemanhattanSegmentalDistance(NumberVector o1, double[] o2, long[] dimensions)Returns the Manhattan segmental distance between o1 and o2 relative to the specified dimensions.private doublemanhattanSegmentalDistance(NumberVector o1, NumberVector o2, long[] dimensions)Returns the Manhattan segmental distance between o1 and o2 relative to the specified dimensions.<V extends NumberVector>
Clustering<SubspaceModel>run(Relation<V> relation)Performs the PROCLUS algorithm on the given database.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface elki.clustering.ClusteringAlgorithm
autorun
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
The logger for this class.
-
m_i
private int m_i
Multiplier for the initial number of medoids.
-
rnd
private RandomFactory rnd
Random generator
-
-
Constructor Detail
-
PROCLUS
public PROCLUS(int k, int k_i, int l, int m_i, RandomFactory rnd)Java constructor.- Parameters:
k- k Parameterk_i- k_i Parameterl- l Parameterm_i- m_i Parameterrnd- Random generator
-
-
Method Detail
-
getInputTypeRestriction
public TypeInformation[] getInputTypeRestriction()
Description copied from interface:AlgorithmGet the input type restriction used for negotiating the data query.- Specified by:
getInputTypeRestrictionin interfaceAlgorithm- Returns:
- Type restriction
-
run
public <V extends NumberVector> Clustering<SubspaceModel> run(Relation<V> relation)
Performs the PROCLUS algorithm on the given database.- Parameters:
relation- Relation to process
-
greedy
private ArrayDBIDs greedy(DistanceQuery<? extends NumberVector> distance, DBIDs sampleSet, int m, java.util.Random random)
Returns a piercing set of k medoids from the specified sample set.- Parameters:
distance- the distance functionsampleSet- the sample setm- the number of medoids to be returnedrandom- random number generator- Returns:
- a piercing set of m medoids from the specified sample set
-
initialSet
private ArrayDBIDs initialSet(DBIDs sampleSet, int k, java.util.Random random)
Returns a set of k elements from the specified sample set.- Parameters:
sampleSet- the sample setk- the number of samples to be returnedrandom- random number generator- Returns:
- a set of k elements from the specified sample set
-
computeM_current
private ArrayDBIDs computeM_current(DBIDs m, DBIDs m_best, DBIDs m_bad, java.util.Random random)
Computes the set of medoids in current iteration.- Parameters:
m- the medoidsm_best- the best set of medoids found so farm_bad- the bad medoidsrandom- random number generator- Returns:
- m_current, the set of medoids in current iteration
-
getLocalities
private DataStore<DBIDs> getLocalities(DBIDs medoids, DistanceQuery<? extends NumberVector> distance, RangeSearcher<DBIDRef> rangeQuery)
Computes the localities of the specified medoids: for each medoid m the objects in the sphere centered at m with radius minDist are determined, where minDist is the minimum distance between medoid m and any other medoid m_i.- Parameters:
medoids- the ids of the medoidsdistance- the distance function- Returns:
- a mapping of the medoid's id to its locality
-
findDimensions
private long[][] findDimensions(ArrayDBIDs medoids, Relation<? extends NumberVector> relation, DistanceQuery<? extends NumberVector> distance, RangeSearcher<DBIDRef> rangeQuery)
Determines the set of correlated dimensions for each medoid in the specified medoid set.- Parameters:
medoids- the set of medoidsrelation- the relation containing the objectsdistance- the distance function- Returns:
- the set of correlated dimensions for each medoid in the specified medoid set
-
findDimensions
private java.util.List<Pair<double[],long[]>> findDimensions(java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters, Relation<? extends NumberVector> database)
Refinement step that determines the set of correlated dimensions for each cluster centroid.- Parameters:
clusters- the list of clustersdatabase- the database containing the objects- Returns:
- the set of correlated dimensions for each specified cluster centroid
-
computeZijs
private java.util.List<PROCLUS.DoubleIntInt> computeZijs(double[][] averageDistances, int dim)
Compute the z_ij values.- Parameters:
averageDistances- Average distancesdim- Dimensions- Returns:
- z_ij values
-
computeDimensionMap
private long[][] computeDimensionMap(java.util.List<PROCLUS.DoubleIntInt> z_ijs, int dim, int numc)
Compute the dimension map.- Parameters:
z_ijs- z_ij valuesdim- Number of dimensionsnumc- Number of clusters- Returns:
- Bitmap of dimensions used
-
assignPoints
private java.util.ArrayList<PROCLUS.PROCLUSCluster> assignPoints(ArrayDBIDs m_current, long[][] dimensions, Relation<? extends NumberVector> database)
Assigns the objects to the clusters.- Parameters:
m_current- Current centersdimensions- set of correlated dimensions for each medoid of the clusterdatabase- the database containing the objects- Returns:
- the assignments of the object to the clusters
-
finalAssignment
private java.util.List<PROCLUS.PROCLUSCluster> finalAssignment(java.util.List<Pair<double[],long[]>> dimensions, Relation<? extends NumberVector> database)
Refinement step to assign the objects to the final clusters.- Parameters:
dimensions- pair containing the centroid and the set of correlated dimensions for the centroiddatabase- the database containing the objects- Returns:
- the assignments of the object to the clusters
-
manhattanSegmentalDistance
private double manhattanSegmentalDistance(NumberVector o1, NumberVector o2, long[] dimensions)
Returns the Manhattan segmental distance between o1 and o2 relative to the specified dimensions.- Parameters:
o1- the first objecto2- the second objectdimensions- the dimensions to be considered- Returns:
- the Manhattan segmental distance between o1 and o2 relative to the specified dimensions
-
manhattanSegmentalDistance
private double manhattanSegmentalDistance(NumberVector o1, double[] o2, long[] dimensions)
Returns the Manhattan segmental distance between o1 and o2 relative to the specified dimensions.- Parameters:
o1- the first objecto2- the second objectdimensions- the dimensions to be considered- Returns:
- the Manhattan segmental distance between o1 and o2 relative to the specified dimensions
-
evaluateClusters
private double evaluateClusters(java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters, long[][] dimensions, Relation<? extends NumberVector> database)
Evaluates the quality of the clusters.- Parameters:
clusters- the clusters to be evaluateddimensions- the dimensions associated with each clusterdatabase- the database holding the objects- Returns:
- a measure for the cluster quality
-
avgDistance
private double avgDistance(double[] centroid, DBIDs objectIDs, Relation<? extends NumberVector> database, int dimension)Computes the average distance of the objects to the centroid along the specified dimension.- Parameters:
centroid- the centroidobjectIDs- the set of objects idsdatabase- the database holding the objectsdimension- the dimension for which the average distance is computed- Returns:
- the average distance of the objects to the centroid along the specified dimension
-
computeBadMedoids
private DBIDs computeBadMedoids(ArrayDBIDs m_current, java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters, int threshold)
Computes the bad medoids, where the medoid of a cluster with less than the specified threshold of objects is bad.- Parameters:
m_current- Current medoidsclusters- the clustersthreshold- the threshold- Returns:
- the bad medoids
-
-