Class DiSH

  • All Implemented Interfaces:
    Algorithm, ClusteringAlgorithm<Clustering<SubspaceModel>>, SubspaceClusteringAlgorithm<SubspaceModel>

    @Title("DiSH: Detecting Subspace cluster Hierarchies")
    @Description("Algorithm to find hierarchical correlation clusters in subspaces.")
    @Reference(authors="E. Achtert, C. B\u00f6hm, H.-P. Kriegel, P. Kr\u00f6ger, I. M\u00fcller-Gorman, A. Zimek",
               title="Detection and Visualization of Subspace Cluster Hierarchies",
               booktitle="Proc. 12th Int. Conf. on Database Systems for Advanced Applications (DASFAA)",
               url="https://doi.org/10.1007/978-3-540-71703-4_15",
               bibkey="DBLP:conf/dasfaa/AchtertBKKMZ07")
    public class DiSH
    extends java.lang.Object
    implements SubspaceClusteringAlgorithm<SubspaceModel>
    Algorithm for detecting subspace hierarchies.

    Reference:

    E. Achtert, C. Böhm, H.-P. Kriegel, P. Kröger, I. Müller-Gorman, A. Zimek
    Detection and Visualization of Subspace Cluster Hierarchies
    Proc. 12th Int. Conf. on Database Systems for Advanced Applications (DASFAA).

    Since:
    0.1
    Author:
    Elke Achtert
    • Field Detail

      • LOG

        private static final Logging LOG
        The logger for this class.
      • minpts

        private int minpts
        OPTICS minPts parameter.
    • Constructor Detail

      • DiSH

        public DiSH​(double epsilon,
                    int minpts,
                    DiSH.Strategy strategy)
        Constructor.
        Parameters:
        epsilon - Epsilon value
        minpts - Mu parameter (minPts)
        strategy - DiSH strategy
    • Method Detail

      • getInputTypeRestriction

        public TypeInformation[] getInputTypeRestriction()
        Description copied from interface: Algorithm
        Get the input type restriction used for negotiating the data query.
        Specified by:
        getInputTypeRestriction in interface Algorithm
        Returns:
        Type restriction
      • logClusterSizes

        private void logClusterSizes​(java.lang.String m,
                                     int dimensionality,
                                     it.unimi.dsi.fastutil.objects.Object2ObjectOpenCustomHashMap<long[],​java.util.List<ArrayModifiableDBIDs>> clustersMap)
        Log cluster sizes in verbose mode.
        Parameters:
        m - Log message
        dimensionality - Dimensionality
        clustersMap - Cluster map
      • extractClusters

        private it.unimi.dsi.fastutil.objects.Object2ObjectOpenCustomHashMap<long[],​java.util.List<ArrayModifiableDBIDs>> extractClusters​(Relation<? extends NumberVector> relation,
                                                                                                                                                DiSH.DiSHClusterOrder clusterOrder)
        Extracts the clusters from the cluster order.
        Parameters:
        relation - the database storing the objects
        clusterOrder - the cluster order to extract the clusters from
        Returns:
        the extracted clusters
      • sortClusters

        private java.util.List<Cluster<SubspaceModel>> sortClusters​(Relation<? extends NumberVector> relation,
                                                                    it.unimi.dsi.fastutil.objects.Object2ObjectMap<long[],​java.util.List<ArrayModifiableDBIDs>> clustersMap)
        Returns a sorted list of the clusters w.r.t. the subspace dimensionality in descending order.
        Parameters:
        relation - the database storing the objects
        clustersMap - the mapping of bits sets to clusters
        Returns:
        a sorted list of the clusters
      • checkClusters

        private void checkClusters​(Relation<? extends NumberVector> relation,
                                   it.unimi.dsi.fastutil.objects.Object2ObjectMap<long[],​java.util.List<ArrayModifiableDBIDs>> clustersMap)
        Removes the clusters with size < minpts from the cluster map and adds them to their parents.
        Parameters:
        relation - the relation storing the objects
        clustersMap - the map containing the clusters
      • findParent

        private Pair<long[],​ArrayModifiableDBIDs> findParent​(Relation<? extends NumberVector> relation,
                                                                   Pair<long[],​ArrayModifiableDBIDs> child,
                                                                   it.unimi.dsi.fastutil.objects.Object2ObjectMap<long[],​java.util.List<ArrayModifiableDBIDs>> clustersMap)
        Returns the parent of the specified cluster
        Parameters:
        relation - the relation storing the objects
        child - the child to search the parent for
        clustersMap - the map containing the clusters
        Returns:
        the parent of the specified cluster
      • buildHierarchy

        private void buildHierarchy​(Relation<? extends NumberVector> database,
                                    Clustering<SubspaceModel> clustering,
                                    java.util.List<Cluster<SubspaceModel>> clusters,
                                    int dimensionality)
        Builds the cluster hierarchy.
        Parameters:
        clustering - Clustering we process
        clusters - the sorted list of clusters
        dimensionality - the dimensionality of the data
        database - the database containing the data objects
      • isParent

        private boolean isParent​(Relation<? extends NumberVector> relation,
                                 Cluster<SubspaceModel> parent,
                                 It<Cluster<SubspaceModel>> iter,
                                 int db_dim)
        Returns true, if the specified parent cluster is a parent of one child of the children clusters.
        Parameters:
        relation - the database containing the objects
        parent - the parent to be tested
        iter - the list of children to be tested
        db_dim - Database dimensionality
        Returns:
        true, if the specified parent cluster is a parent of one child of the children clusters, false otherwise
      • subspaceDimensionality

        private int subspaceDimensionality​(NumberVector v1,
                                           NumberVector v2,
                                           long[] pv1,
                                           long[] pv2,
                                           long[] commonPreferenceVector)
        Compute the common subspace dimensionality of two vectors.
        Parameters:
        v1 - First vector
        v2 - Second vector
        pv1 - First preference
        pv2 - Second preference
        commonPreferenceVector - Common preference
        Returns:
        Usually, v1.dim - commonPreference.cardinality, unless either pv1 and pv2 are a subset of the other.
      • weightedDistance

        protected static double weightedDistance​(NumberVector v1,
                                                 NumberVector v2,
                                                 long[] weightVector)
        Computes the weighted distance between the two specified vectors according to the given preference vector.
        Parameters:
        v1 - the first vector
        v2 - the second vector
        weightVector - the preference vector
        Returns:
        the weighted distance between the two specified vectors according to the given preference vector