Package elki.clustering.subspace
Class PROCLUS
- java.lang.Object
-
- elki.clustering.AbstractProjectedClustering<Clustering<SubspaceModel>>
-
- elki.clustering.subspace.PROCLUS
-
- All Implemented Interfaces:
Algorithm
,ClusteringAlgorithm<Clustering<SubspaceModel>>
,SubspaceClusteringAlgorithm<SubspaceModel>
@Title("PROCLUS: PROjected CLUStering") @Description("Algorithm to find subspace clusters in high dimensional spaces.") @Reference(authors="C. C. Aggarwal, C. Procopiuc, J. L. Wolf, P. S. Yu, J. S. Park", title="Fast Algorithms for Projected Clustering", booktitle="Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD \'99)", url="https://doi.org/10.1145/304181.304188", bibkey="doi:10.1145/304181.304188") public class PROCLUS extends AbstractProjectedClustering<Clustering<SubspaceModel>> implements SubspaceClusteringAlgorithm<SubspaceModel>
The PROCLUS algorithm, an algorithm to find subspace clusters in high dimensional spaces.Reference:
C. C. Aggarwal, C. Procopiuc, J. L. Wolf, P. S. Yu, J. S. Park
Fast Algorithms for Projected Clustering
Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD '99).- Since:
- 0.1
- Author:
- Elke Achtert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private static class
PROCLUS.DoubleIntInt
Simple triple.static class
PROCLUS.Par
Parameterization class.private static class
PROCLUS.PROCLUSCluster
Encapsulates the attributes of a cluster.-
Nested classes/interfaces inherited from interface elki.Algorithm
Algorithm.Utils
-
-
Field Summary
Fields Modifier and Type Field Description private static Logging
LOG
The logger for this class.private int
m_i
Multiplier for the initial number of medoids.private RandomFactory
rnd
Random generator-
Fields inherited from class elki.clustering.AbstractProjectedClustering
k, k_i, l
-
-
Constructor Summary
Constructors Constructor Description PROCLUS(int k, int k_i, int l, int m_i, RandomFactory rnd)
Java constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private java.util.ArrayList<PROCLUS.PROCLUSCluster>
assignPoints(ArrayDBIDs m_current, long[][] dimensions, Relation<? extends NumberVector> database)
Assigns the objects to the clusters.private double
avgDistance(double[] centroid, DBIDs objectIDs, Relation<? extends NumberVector> database, int dimension)
Computes the average distance of the objects to the centroid along the specified dimension.private DBIDs
computeBadMedoids(ArrayDBIDs m_current, java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters, int threshold)
Computes the bad medoids, where the medoid of a cluster with less than the specified threshold of objects is bad.private long[][]
computeDimensionMap(java.util.List<PROCLUS.DoubleIntInt> z_ijs, int dim, int numc)
Compute the dimension map.private ArrayDBIDs
computeM_current(DBIDs m, DBIDs m_best, DBIDs m_bad, java.util.Random random)
Computes the set of medoids in current iteration.private java.util.List<PROCLUS.DoubleIntInt>
computeZijs(double[][] averageDistances, int dim)
Compute the z_ij values.private double
evaluateClusters(java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters, long[][] dimensions, Relation<? extends NumberVector> database)
Evaluates the quality of the clusters.private java.util.List<PROCLUS.PROCLUSCluster>
finalAssignment(java.util.List<Pair<double[],long[]>> dimensions, Relation<? extends NumberVector> database)
Refinement step to assign the objects to the final clusters.private long[][]
findDimensions(ArrayDBIDs medoids, Relation<? extends NumberVector> relation, DistanceQuery<? extends NumberVector> distance, RangeSearcher<DBIDRef> rangeQuery)
Determines the set of correlated dimensions for each medoid in the specified medoid set.private java.util.List<Pair<double[],long[]>>
findDimensions(java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters, Relation<? extends NumberVector> database)
Refinement step that determines the set of correlated dimensions for each cluster centroid.TypeInformation[]
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.private DataStore<DBIDs>
getLocalities(DBIDs medoids, DistanceQuery<? extends NumberVector> distance, RangeSearcher<DBIDRef> rangeQuery)
Computes the localities of the specified medoids: for each medoid m the objects in the sphere centered at m with radius minDist are determined, where minDist is the minimum distance between medoid m and any other medoid m_i.private ArrayDBIDs
greedy(DistanceQuery<? extends NumberVector> distance, DBIDs sampleSet, int m, java.util.Random random)
Returns a piercing set of k medoids from the specified sample set.private ArrayDBIDs
initialSet(DBIDs sampleSet, int k, java.util.Random random)
Returns a set of k elements from the specified sample set.private double
manhattanSegmentalDistance(NumberVector o1, double[] o2, long[] dimensions)
Returns the Manhattan segmental distance between o1 and o2 relative to the specified dimensions.private double
manhattanSegmentalDistance(NumberVector o1, NumberVector o2, long[] dimensions)
Returns the Manhattan segmental distance between o1 and o2 relative to the specified dimensions.<V extends NumberVector>
Clustering<SubspaceModel>run(Relation<V> relation)
Performs the PROCLUS algorithm on the given database.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface elki.clustering.ClusteringAlgorithm
autorun
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
The logger for this class.
-
m_i
private int m_i
Multiplier for the initial number of medoids.
-
rnd
private RandomFactory rnd
Random generator
-
-
Constructor Detail
-
PROCLUS
public PROCLUS(int k, int k_i, int l, int m_i, RandomFactory rnd)
Java constructor.- Parameters:
k
- k Parameterk_i
- k_i Parameterl
- l Parameterm_i
- m_i Parameterrnd
- Random generator
-
-
Method Detail
-
getInputTypeRestriction
public TypeInformation[] getInputTypeRestriction()
Description copied from interface:Algorithm
Get the input type restriction used for negotiating the data query.- Specified by:
getInputTypeRestriction
in interfaceAlgorithm
- Returns:
- Type restriction
-
run
public <V extends NumberVector> Clustering<SubspaceModel> run(Relation<V> relation)
Performs the PROCLUS algorithm on the given database.- Parameters:
relation
- Relation to process
-
greedy
private ArrayDBIDs greedy(DistanceQuery<? extends NumberVector> distance, DBIDs sampleSet, int m, java.util.Random random)
Returns a piercing set of k medoids from the specified sample set.- Parameters:
distance
- the distance functionsampleSet
- the sample setm
- the number of medoids to be returnedrandom
- random number generator- Returns:
- a piercing set of m medoids from the specified sample set
-
initialSet
private ArrayDBIDs initialSet(DBIDs sampleSet, int k, java.util.Random random)
Returns a set of k elements from the specified sample set.- Parameters:
sampleSet
- the sample setk
- the number of samples to be returnedrandom
- random number generator- Returns:
- a set of k elements from the specified sample set
-
computeM_current
private ArrayDBIDs computeM_current(DBIDs m, DBIDs m_best, DBIDs m_bad, java.util.Random random)
Computes the set of medoids in current iteration.- Parameters:
m
- the medoidsm_best
- the best set of medoids found so farm_bad
- the bad medoidsrandom
- random number generator- Returns:
- m_current, the set of medoids in current iteration
-
getLocalities
private DataStore<DBIDs> getLocalities(DBIDs medoids, DistanceQuery<? extends NumberVector> distance, RangeSearcher<DBIDRef> rangeQuery)
Computes the localities of the specified medoids: for each medoid m the objects in the sphere centered at m with radius minDist are determined, where minDist is the minimum distance between medoid m and any other medoid m_i.- Parameters:
medoids
- the ids of the medoidsdistance
- the distance function- Returns:
- a mapping of the medoid's id to its locality
-
findDimensions
private long[][] findDimensions(ArrayDBIDs medoids, Relation<? extends NumberVector> relation, DistanceQuery<? extends NumberVector> distance, RangeSearcher<DBIDRef> rangeQuery)
Determines the set of correlated dimensions for each medoid in the specified medoid set.- Parameters:
medoids
- the set of medoidsrelation
- the relation containing the objectsdistance
- the distance function- Returns:
- the set of correlated dimensions for each medoid in the specified medoid set
-
findDimensions
private java.util.List<Pair<double[],long[]>> findDimensions(java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters, Relation<? extends NumberVector> database)
Refinement step that determines the set of correlated dimensions for each cluster centroid.- Parameters:
clusters
- the list of clustersdatabase
- the database containing the objects- Returns:
- the set of correlated dimensions for each specified cluster centroid
-
computeZijs
private java.util.List<PROCLUS.DoubleIntInt> computeZijs(double[][] averageDistances, int dim)
Compute the z_ij values.- Parameters:
averageDistances
- Average distancesdim
- Dimensions- Returns:
- z_ij values
-
computeDimensionMap
private long[][] computeDimensionMap(java.util.List<PROCLUS.DoubleIntInt> z_ijs, int dim, int numc)
Compute the dimension map.- Parameters:
z_ijs
- z_ij valuesdim
- Number of dimensionsnumc
- Number of clusters- Returns:
- Bitmap of dimensions used
-
assignPoints
private java.util.ArrayList<PROCLUS.PROCLUSCluster> assignPoints(ArrayDBIDs m_current, long[][] dimensions, Relation<? extends NumberVector> database)
Assigns the objects to the clusters.- Parameters:
m_current
- Current centersdimensions
- set of correlated dimensions for each medoid of the clusterdatabase
- the database containing the objects- Returns:
- the assignments of the object to the clusters
-
finalAssignment
private java.util.List<PROCLUS.PROCLUSCluster> finalAssignment(java.util.List<Pair<double[],long[]>> dimensions, Relation<? extends NumberVector> database)
Refinement step to assign the objects to the final clusters.- Parameters:
dimensions
- pair containing the centroid and the set of correlated dimensions for the centroiddatabase
- the database containing the objects- Returns:
- the assignments of the object to the clusters
-
manhattanSegmentalDistance
private double manhattanSegmentalDistance(NumberVector o1, NumberVector o2, long[] dimensions)
Returns the Manhattan segmental distance between o1 and o2 relative to the specified dimensions.- Parameters:
o1
- the first objecto2
- the second objectdimensions
- the dimensions to be considered- Returns:
- the Manhattan segmental distance between o1 and o2 relative to the specified dimensions
-
manhattanSegmentalDistance
private double manhattanSegmentalDistance(NumberVector o1, double[] o2, long[] dimensions)
Returns the Manhattan segmental distance between o1 and o2 relative to the specified dimensions.- Parameters:
o1
- the first objecto2
- the second objectdimensions
- the dimensions to be considered- Returns:
- the Manhattan segmental distance between o1 and o2 relative to the specified dimensions
-
evaluateClusters
private double evaluateClusters(java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters, long[][] dimensions, Relation<? extends NumberVector> database)
Evaluates the quality of the clusters.- Parameters:
clusters
- the clusters to be evaluateddimensions
- the dimensions associated with each clusterdatabase
- the database holding the objects- Returns:
- a measure for the cluster quality
-
avgDistance
private double avgDistance(double[] centroid, DBIDs objectIDs, Relation<? extends NumberVector> database, int dimension)
Computes the average distance of the objects to the centroid along the specified dimension.- Parameters:
centroid
- the centroidobjectIDs
- the set of objects idsdatabase
- the database holding the objectsdimension
- the dimension for which the average distance is computed- Returns:
- the average distance of the objects to the centroid along the specified dimension
-
computeBadMedoids
private DBIDs computeBadMedoids(ArrayDBIDs m_current, java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters, int threshold)
Computes the bad medoids, where the medoid of a cluster with less than the specified threshold of objects is bad.- Parameters:
m_current
- Current medoidsclusters
- the clustersthreshold
- the threshold- Returns:
- the bad medoids
-
-