Class SUBCLU<V extends NumberVector>

  • Type Parameters:
    V - the type of NumberVector handled by this algorithm
    All Implemented Interfaces:
    Algorithm, ClusteringAlgorithm<Clustering<SubspaceModel>>, SubspaceClusteringAlgorithm<SubspaceModel>

    @Title("SUBCLU: Density connected Subspace Clustering")
    @Description("Algorithm to detect arbitrarily shaped and positioned clusters in subspaces. SUBCLU delivers for each subspace the same clusters DBSCAN would have found, when applied to this subspace seperately.")
    @Reference(authors="Karin Kailing, Hans-Peter Kriegel, Peer Kr\u00f6ger",
               title="Density Connected Subspace Clustering for High Dimensional Data",
               booktitle="Proc. SIAM Int. Conf. on Data Mining (SDM\'04)",
               url="https://doi.org/10.1137/1.9781611972740.23",
               bibkey="DBLP:conf/sdm/KroegerKK04")
    public class SUBCLU<V extends NumberVector>
    extends java.lang.Object
    implements SubspaceClusteringAlgorithm<SubspaceModel>
    Implementation of the SUBCLU algorithm, an algorithm to detect arbitrarily shaped and positioned clusters in subspaces. SUBCLU delivers for each subspace the same clusters DBSCAN would have found, when applied to this subspace separately.

    Note: this implementation does not yet implemented the suggested indexing based on inverted files, and that we also only query subsets of them, this makes the implementation as a reusable index challenging.

    The original paper is not very clear on which clusters to return, as any subspace cluster must be part of a lower-dimensional projected cluster, so these results would be highly redundant. In this implementation, we only include points in clusters that are not already part of sub-clusters (note that this does not remove overlap of independent subspaces).

    Reference:

    Karin Kailing, Hans-Peter Kriegel, Peer Kröger
    Density Connected Subspace Clustering for High Dimensional Data
    Proc. SIAM Int. Conf. on Data Mining (SDM'04)

    Since:
    0.3
    Author:
    Elke Achtert
    • Field Detail

      • LOG

        private static final Logging LOG
        The logger for this class.
      • epsilon

        protected double epsilon
        Maximum radius of the neighborhood to be considered.
      • minpts

        protected int minpts
        Minimum number of points.
      • mindim

        protected int mindim
        Minimum dimensionality.
    • Constructor Detail

      • SUBCLU

        public SUBCLU​(DimensionSelectingSubspaceDistance<V> distance,
                      double epsilon,
                      int minpts,
                      int mindim)
        Constructor.
        Parameters:
        distance - Distance function
        epsilon - Epsilon value
        minpts - Minpts value
        mindim - Minimum dimensionality
    • Method Detail

      • getInputTypeRestriction

        public TypeInformation[] getInputTypeRestriction()
        Description copied from interface: Algorithm
        Get the input type restriction used for negotiating the data query.
        Specified by:
        getInputTypeRestriction in interface Algorithm
        Returns:
        Type restriction
      • run

        public Clustering<SubspaceModel> run​(Relation<V> relation)
        Performs the SUBCLU algorithm on the given database.
        Parameters:
        relation - Relation to process
        Returns:
        Clustering result
      • runDBSCAN

        private java.util.List<Cluster<Model>> runDBSCAN​(Relation<V> relation,
                                                         DBIDs ids,
                                                         Subspace subspace)
        Runs the DBSCAN algorithm on the specified partition of the database in the given subspace. If parameter ids is null DBSCAN will be applied to the whole database.
        Parameters:
        relation - the database holding the objects to run DBSCAN on
        ids - the IDs of the database defining the partition to run DBSCAN on - if this parameter is null DBSCAN will be applied to the whole database
        subspace - the subspace to run DBSCAN on
        Returns:
        the clustering result of the DBSCAN run
      • generateSubspaceCandidates

        private java.util.List<Subspace> generateSubspaceCandidates​(java.util.List<Subspace> subspaces)
        Generates d+1-dimensional subspace candidates from the specified d-dimensional subspaces.
        Parameters:
        subspaces - the d-dimensional subspaces
        Returns:
        the d+1-dimensional subspace candidates
      • checkLower

        private boolean checkLower​(Subspace candidate,
                                   java.util.List<Subspace> subspaces)
        Perform Apriori-style pruning.
        Parameters:
        candidate - Current candidate
        subspaces - Subspaces
        Returns:
        true if all lower-dimensional subspaces exist
      • bestSubspace

        private Subspace bestSubspace​(java.util.List<Subspace> subspaces,
                                      Subspace candidate,
                                      java.util.TreeMap<Subspace,​java.util.List<Cluster<Model>>> clusterMap)
        Determines the d-dimensional subspace of the (d+1) -dimensional candidate with minimal number of objects in the cluster.
        Parameters:
        subspaces - the list of d-dimensional subspaces containing clusters
        candidate - the (d+1)-dimensional candidate subspace
        clusterMap - the mapping of subspaces to clusters
        Returns:
        the d-dimensional subspace of the (d+1) -dimensional candidate with minimal number of objects in the cluster