Class PROCLUS

    • Field Detail

      • LOG

        private static final Logging LOG
        The logger for this class.
      • m_i

        private int m_i
        Multiplier for the initial number of medoids.
    • Constructor Detail

      • PROCLUS

        public PROCLUS​(int k,
                       int k_i,
                       int l,
                       int m_i,
                       RandomFactory rnd)
        Java constructor.
        Parameters:
        k - k Parameter
        k_i - k_i Parameter
        l - l Parameter
        m_i - m_i Parameter
        rnd - Random generator
    • Method Detail

      • getInputTypeRestriction

        public TypeInformation[] getInputTypeRestriction()
        Description copied from interface: Algorithm
        Get the input type restriction used for negotiating the data query.
        Specified by:
        getInputTypeRestriction in interface Algorithm
        Returns:
        Type restriction
      • greedy

        private ArrayDBIDs greedy​(DistanceQuery<? extends NumberVector> distance,
                                  DBIDs sampleSet,
                                  int m,
                                  java.util.Random random)
        Returns a piercing set of k medoids from the specified sample set.
        Parameters:
        distance - the distance function
        sampleSet - the sample set
        m - the number of medoids to be returned
        random - random number generator
        Returns:
        a piercing set of m medoids from the specified sample set
      • initialSet

        private ArrayDBIDs initialSet​(DBIDs sampleSet,
                                      int k,
                                      java.util.Random random)
        Returns a set of k elements from the specified sample set.
        Parameters:
        sampleSet - the sample set
        k - the number of samples to be returned
        random - random number generator
        Returns:
        a set of k elements from the specified sample set
      • computeM_current

        private ArrayDBIDs computeM_current​(DBIDs m,
                                            DBIDs m_best,
                                            DBIDs m_bad,
                                            java.util.Random random)
        Computes the set of medoids in current iteration.
        Parameters:
        m - the medoids
        m_best - the best set of medoids found so far
        m_bad - the bad medoids
        random - random number generator
        Returns:
        m_current, the set of medoids in current iteration
      • getLocalities

        private DataStore<DBIDs> getLocalities​(DBIDs medoids,
                                               DistanceQuery<? extends NumberVector> distance,
                                               RangeSearcher<DBIDRef> rangeQuery)
        Computes the localities of the specified medoids: for each medoid m the objects in the sphere centered at m with radius minDist are determined, where minDist is the minimum distance between medoid m and any other medoid m_i.
        Parameters:
        medoids - the ids of the medoids
        distance - the distance function
        Returns:
        a mapping of the medoid's id to its locality
      • findDimensions

        private long[][] findDimensions​(ArrayDBIDs medoids,
                                        Relation<? extends NumberVector> relation,
                                        DistanceQuery<? extends NumberVector> distance,
                                        RangeSearcher<DBIDRef> rangeQuery)
        Determines the set of correlated dimensions for each medoid in the specified medoid set.
        Parameters:
        medoids - the set of medoids
        relation - the relation containing the objects
        distance - the distance function
        Returns:
        the set of correlated dimensions for each medoid in the specified medoid set
      • findDimensions

        private java.util.List<Pair<double[],​long[]>> findDimensions​(java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters,
                                                                           Relation<? extends NumberVector> database)
        Refinement step that determines the set of correlated dimensions for each cluster centroid.
        Parameters:
        clusters - the list of clusters
        database - the database containing the objects
        Returns:
        the set of correlated dimensions for each specified cluster centroid
      • computeZijs

        private java.util.List<PROCLUS.DoubleIntInt> computeZijs​(double[][] averageDistances,
                                                                 int dim)
        Compute the z_ij values.
        Parameters:
        averageDistances - Average distances
        dim - Dimensions
        Returns:
        z_ij values
      • computeDimensionMap

        private long[][] computeDimensionMap​(java.util.List<PROCLUS.DoubleIntInt> z_ijs,
                                             int dim,
                                             int numc)
        Compute the dimension map.
        Parameters:
        z_ijs - z_ij values
        dim - Number of dimensions
        numc - Number of clusters
        Returns:
        Bitmap of dimensions used
      • assignPoints

        private java.util.ArrayList<PROCLUS.PROCLUSCluster> assignPoints​(ArrayDBIDs m_current,
                                                                         long[][] dimensions,
                                                                         Relation<? extends NumberVector> database)
        Assigns the objects to the clusters.
        Parameters:
        m_current - Current centers
        dimensions - set of correlated dimensions for each medoid of the cluster
        database - the database containing the objects
        Returns:
        the assignments of the object to the clusters
      • finalAssignment

        private java.util.List<PROCLUS.PROCLUSCluster> finalAssignment​(java.util.List<Pair<double[],​long[]>> dimensions,
                                                                       Relation<? extends NumberVector> database)
        Refinement step to assign the objects to the final clusters.
        Parameters:
        dimensions - pair containing the centroid and the set of correlated dimensions for the centroid
        database - the database containing the objects
        Returns:
        the assignments of the object to the clusters
      • manhattanSegmentalDistance

        private double manhattanSegmentalDistance​(NumberVector o1,
                                                  NumberVector o2,
                                                  long[] dimensions)
        Returns the Manhattan segmental distance between o1 and o2 relative to the specified dimensions.
        Parameters:
        o1 - the first object
        o2 - the second object
        dimensions - the dimensions to be considered
        Returns:
        the Manhattan segmental distance between o1 and o2 relative to the specified dimensions
      • manhattanSegmentalDistance

        private double manhattanSegmentalDistance​(NumberVector o1,
                                                  double[] o2,
                                                  long[] dimensions)
        Returns the Manhattan segmental distance between o1 and o2 relative to the specified dimensions.
        Parameters:
        o1 - the first object
        o2 - the second object
        dimensions - the dimensions to be considered
        Returns:
        the Manhattan segmental distance between o1 and o2 relative to the specified dimensions
      • evaluateClusters

        private double evaluateClusters​(java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters,
                                        long[][] dimensions,
                                        Relation<? extends NumberVector> database)
        Evaluates the quality of the clusters.
        Parameters:
        clusters - the clusters to be evaluated
        dimensions - the dimensions associated with each cluster
        database - the database holding the objects
        Returns:
        a measure for the cluster quality
      • avgDistance

        private double avgDistance​(double[] centroid,
                                   DBIDs objectIDs,
                                   Relation<? extends NumberVector> database,
                                   int dimension)
        Computes the average distance of the objects to the centroid along the specified dimension.
        Parameters:
        centroid - the centroid
        objectIDs - the set of objects ids
        database - the database holding the objects
        dimension - the dimension for which the average distance is computed
        Returns:
        the average distance of the objects to the centroid along the specified dimension
      • computeBadMedoids

        private DBIDs computeBadMedoids​(ArrayDBIDs m_current,
                                        java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters,
                                        int threshold)
        Computes the bad medoids, where the medoid of a cluster with less than the specified threshold of objects is bad.
        Parameters:
        m_current - Current medoids
        clusters - the clusters
        threshold - the threshold
        Returns:
        the bad medoids