V - the type of NumberVector handled by this Algorithm@Title(value="PROCLUS: PROjected CLUStering") @Description(value="Algorithm to find subspace clusters in high dimensional spaces.") @Reference(authors="C. C. Aggarwal, C. Procopiuc, J. L. Wolf, P. S. Yu, J. S. Park", title="Fast Algorithms for Projected Clustering", booktitle="Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD \'99)", url="https://doi.org/10.1145/304181.304188", bibkey="doi:10.1145/304181.304188") public class PROCLUS<V extends NumberVector> extends AbstractProjectedClustering<Clustering<SubspaceModel>,V> implements SubspaceClusteringAlgorithm<SubspaceModel>
Reference:
 C. C. Aggarwal, C. Procopiuc, J. L. Wolf, P. S. Yu, J. S. Park
 Fast Algorithms for Projected Clustering
 Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD '99).
| Modifier and Type | Class and Description | 
|---|---|
private static class  | 
PROCLUS.DoubleIntInt
Simple triple. 
 | 
static class  | 
PROCLUS.Parameterizer<V extends NumberVector>
Parameterization class. 
 | 
private static class  | 
PROCLUS.PROCLUSCluster
Encapsulates the attributes of a cluster. 
 | 
| Modifier and Type | Field and Description | 
|---|---|
private static Logging | 
LOG
The logger for this class. 
 | 
private int | 
m_i
Multiplier for the initial number of medoids. 
 | 
private RandomFactory | 
rnd
Random generator 
 | 
k, k_i, lALGORITHM_ID| Constructor and Description | 
|---|
PROCLUS(int k,
       int k_i,
       int l,
       int m_i,
       RandomFactory rnd)
Java constructor. 
 | 
| Modifier and Type | Method and Description | 
|---|---|
private java.util.ArrayList<PROCLUS.PROCLUSCluster> | 
assignPoints(ArrayDBIDs m_current,
            long[][] dimensions,
            Relation<V> database)
Assigns the objects to the clusters. 
 | 
private double | 
avgDistance(double[] centroid,
           DBIDs objectIDs,
           Relation<V> database,
           int dimension)
Computes the average distance of the objects to the centroid along the
 specified dimension. 
 | 
private DBIDs | 
computeBadMedoids(ArrayDBIDs m_current,
                 java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters,
                 int threshold)
Computes the bad medoids, where the medoid of a cluster with less than the
 specified threshold of objects is bad. 
 | 
private long[][] | 
computeDimensionMap(java.util.List<PROCLUS.DoubleIntInt> z_ijs,
                   int dim,
                   int numc)
Compute the dimension map. 
 | 
private ArrayDBIDs | 
computeM_current(DBIDs m,
                DBIDs m_best,
                DBIDs m_bad,
                java.util.Random random)
Computes the set of medoids in current iteration. 
 | 
private java.util.List<PROCLUS.DoubleIntInt> | 
computeZijs(double[][] averageDistances,
           int dim)
Compute the z_ij values. 
 | 
private double | 
evaluateClusters(java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters,
                long[][] dimensions,
                Relation<V> database)
Evaluates the quality of the clusters. 
 | 
private java.util.List<PROCLUS.PROCLUSCluster> | 
finalAssignment(java.util.List<Pair<double[],long[]>> dimensions,
               Relation<V> database)
Refinement step to assign the objects to the final clusters. 
 | 
private long[][] | 
findDimensions(ArrayDBIDs medoids,
              Relation<V> database,
              DistanceQuery<V> distFunc,
              RangeQuery<V> rangeQuery)
Determines the set of correlated dimensions for each medoid in the
 specified medoid set. 
 | 
private java.util.List<Pair<double[],long[]>> | 
findDimensions(java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters,
              Relation<V> database)
Refinement step that determines the set of correlated dimensions for each
 cluster centroid. 
 | 
TypeInformation[] | 
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query. 
 | 
private DataStore<DBIDs> | 
getLocalities(DBIDs medoids,
             DistanceQuery<V> distFunc,
             RangeQuery<V> rangeQuery)
Computes the localities of the specified medoids: for each medoid m the
 objects in the sphere centered at m with radius minDist are determined,
 where minDist is the minimum distance between medoid m and any other medoid
 m_i. 
 | 
protected Logging | 
getLogger()
Get the (STATIC) logger for this class. 
 | 
private ArrayDBIDs | 
greedy(DistanceQuery<V> distFunc,
      DBIDs sampleSet,
      int m,
      java.util.Random random)
Returns a piercing set of k medoids from the specified sample set. 
 | 
private ArrayDBIDs | 
initialSet(DBIDs sampleSet,
          int k,
          java.util.Random random)
Returns a set of k elements from the specified sample set. 
 | 
private double | 
manhattanSegmentalDistance(NumberVector o1,
                          double[] o2,
                          long[] dimensions)
Returns the Manhattan segmental distance between o1 and o2 relative to the
 specified dimensions. 
 | 
private double | 
manhattanSegmentalDistance(NumberVector o1,
                          NumberVector o2,
                          long[] dimensions)
Returns the Manhattan segmental distance between o1 and o2 relative to the
 specified dimensions. 
 | 
Clustering<SubspaceModel> | 
run(Database database,
   Relation<V> relation)
Performs the PROCLUS algorithm on the given database. 
 | 
runclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitrunprivate static final Logging LOG
private int m_i
private RandomFactory rnd
public PROCLUS(int k,
               int k_i,
               int l,
               int m_i,
               RandomFactory rnd)
k - k Parameterk_i - k_i Parameterl - l Parameterm_i - m_i Parameterrnd - Random generatorpublic Clustering<SubspaceModel> run(Database database, Relation<V> relation)
database - Database to processrelation - Relation to processprivate ArrayDBIDs greedy(DistanceQuery<V> distFunc, DBIDs sampleSet, int m, java.util.Random random)
distFunc - the distance functionsampleSet - the sample setm - the number of medoids to be returnedrandom - random number generatorprivate ArrayDBIDs initialSet(DBIDs sampleSet, int k, java.util.Random random)
sampleSet - the sample setk - the number of samples to be returnedrandom - random number generatorprivate ArrayDBIDs computeM_current(DBIDs m, DBIDs m_best, DBIDs m_bad, java.util.Random random)
m - the medoidsm_best - the best set of medoids found so farm_bad - the bad medoidsrandom - random number generatorprivate DataStore<DBIDs> getLocalities(DBIDs medoids, DistanceQuery<V> distFunc, RangeQuery<V> rangeQuery)
medoids - the ids of the medoidsdistFunc - the distance functionprivate long[][] findDimensions(ArrayDBIDs medoids, Relation<V> database, DistanceQuery<V> distFunc, RangeQuery<V> rangeQuery)
medoids - the set of medoidsdatabase - the database containing the objectsdistFunc - the distance functionprivate java.util.List<Pair<double[],long[]>> findDimensions(java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters, Relation<V> database)
clusters - the list of clustersdatabase - the database containing the objectsprivate java.util.List<PROCLUS.DoubleIntInt> computeZijs(double[][] averageDistances, int dim)
averageDistances - Average distancesdim - Dimensionsprivate long[][] computeDimensionMap(java.util.List<PROCLUS.DoubleIntInt> z_ijs, int dim, int numc)
z_ijs - z_ij valuesdim - Number of dimensionsnumc - Number of clustersprivate java.util.ArrayList<PROCLUS.PROCLUSCluster> assignPoints(ArrayDBIDs m_current, long[][] dimensions, Relation<V> database)
m_current - Current centersdimensions - set of correlated dimensions for each medoid of the
        clusterdatabase - the database containing the objectsprivate java.util.List<PROCLUS.PROCLUSCluster> finalAssignment(java.util.List<Pair<double[],long[]>> dimensions, Relation<V> database)
dimensions - pair containing the centroid and the set of correlated
        dimensions for the centroiddatabase - the database containing the objectsprivate double manhattanSegmentalDistance(NumberVector o1, NumberVector o2, long[] dimensions)
o1 - the first objecto2 - the second objectdimensions - the dimensions to be consideredprivate double manhattanSegmentalDistance(NumberVector o1, double[] o2, long[] dimensions)
o1 - the first objecto2 - the second objectdimensions - the dimensions to be consideredprivate double evaluateClusters(java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters, long[][] dimensions, Relation<V> database)
clusters - the clusters to be evaluateddimensions - the dimensions associated with each clusterdatabase - the database holding the objectsprivate double avgDistance(double[] centroid,
                           DBIDs objectIDs,
                           Relation<V> database,
                           int dimension)
centroid - the centroidobjectIDs - the set of objects idsdatabase - the database holding the objectsdimension - the dimension for which the average distance is computedprivate DBIDs computeBadMedoids(ArrayDBIDs m_current, java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters, int threshold)
m_current - Current medoidsclusters - the clustersthreshold - the thresholdpublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithmgetInputTypeRestriction in interface AlgorithmgetInputTypeRestriction in class AbstractAlgorithm<Clustering<SubspaceModel>>protected Logging getLogger()
AbstractAlgorithmgetLogger in class AbstractAlgorithm<Clustering<SubspaceModel>>Copyright © 2019 ELKI Development Team. License information.