V - the type of NumberVector handled by this algorithm@Title(value="SUBCLU: Density connected Subspace Clustering") @Description(value="Algorithm to detect arbitrarily shaped and positioned clusters in subspaces. SUBCLU delivers for each subspace the same clusters DBSCAN would have found, when applied to this subspace seperately.") @Reference(authors="Karin Kailing, Hans-Peter Kriegel, Peer Kr\u00f6ger", title="Density Connected Subspace Clustering for High Dimensional Data", booktitle="Proc. SIAM Int. Conf. on Data Mining (SDM\'04)", url="https://doi.org/10.1137/1.9781611972740.23", bibkey="DBLP:conf/sdm/KroegerKK04") public class SUBCLU<V extends NumberVector> extends AbstractAlgorithm<Clustering<SubspaceModel>> implements SubspaceClusteringAlgorithm<SubspaceModel>
Note: this implementation does not yet implemented the suggested indexing based on inverted files, and that we also only query subsets of them, this makes the implementation as a reusable index challenging.
The original paper is not very clear on which clusters to return, as any subspace cluster must be part of a lower-dimensional projected cluster, so these results would be highly redundant. In this implementation, we only include points in clusters that are not already part of sub-clusters (note that this does not remove overlap of independent subspaces).
Reference:
 Karin Kailing, Hans-Peter Kriegel, Peer Kröger
 Density Connected Subspace Clustering for High Dimensional Data
 Proc. SIAM Int. Conf. on Data Mining (SDM'04)
| Modifier and Type | Class and Description | 
|---|---|
static class  | 
SUBCLU.Parameterizer<V extends NumberVector>
Parameterization class. 
 | 
| Modifier and Type | Field and Description | 
|---|---|
protected DimensionSelectingSubspaceDistanceFunction<V> | 
distanceFunction
The distance function to determine the distance between objects. 
 | 
protected double | 
epsilon
Maximum radius of the neighborhood to be considered. 
 | 
private static Logging | 
LOG
The logger for this class. 
 | 
protected int | 
mindim
Minimum dimensionality. 
 | 
protected int | 
minpts
Minimum number of points. 
 | 
ALGORITHM_ID| Constructor and Description | 
|---|
SUBCLU(DimensionSelectingSubspaceDistanceFunction<V> distanceFunction,
      double epsilon,
      int minpts,
      int mindim)
Constructor. 
 | 
| Modifier and Type | Method and Description | 
|---|---|
private Subspace | 
bestSubspace(java.util.List<Subspace> subspaces,
            Subspace candidate,
            java.util.TreeMap<Subspace,java.util.List<Cluster<Model>>> clusterMap)
Determines the  
d-dimensional subspace of the (d+1)
 -dimensional candidate with minimal number of objects in the cluster. | 
private boolean | 
checkLower(Subspace candidate,
          java.util.List<Subspace> subspaces)
Perform Apriori-style pruning. 
 | 
private java.util.List<Subspace> | 
generateSubspaceCandidates(java.util.List<Subspace> subspaces)
Generates  
d+1-dimensional subspace candidates from the specified
 d-dimensional subspaces. | 
TypeInformation[] | 
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query. 
 | 
protected Logging | 
getLogger()
Get the (STATIC) logger for this class. 
 | 
Clustering<SubspaceModel> | 
run(Relation<V> relation)
Performs the SUBCLU algorithm on the given database. 
 | 
private java.util.List<Cluster<Model>> | 
runDBSCAN(Relation<V> relation,
         DBIDs ids,
         Subspace subspace)
Runs the DBSCAN algorithm on the specified partition of the database in the
 given subspace. 
 | 
runclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitrunprivate static final Logging LOG
protected DimensionSelectingSubspaceDistanceFunction<V extends NumberVector> distanceFunction
protected double epsilon
protected int minpts
protected int mindim
public SUBCLU(DimensionSelectingSubspaceDistanceFunction<V> distanceFunction, double epsilon, int minpts, int mindim)
distanceFunction - Distance functionepsilon - Epsilon valueminpts - Minpts valuemindim - Minimum dimensionalitypublic Clustering<SubspaceModel> run(Relation<V> relation)
relation - Relation to processprivate java.util.List<Cluster<Model>> runDBSCAN(Relation<V> relation, DBIDs ids, Subspace subspace)
ids is null DBSCAN will be applied to
 the whole database.relation - the database holding the objects to run DBSCAN onids - the IDs of the database defining the partition to run DBSCAN on
        - if this parameter is null DBSCAN will be applied to the whole
        databasesubspace - the subspace to run DBSCAN onprivate java.util.List<Subspace> generateSubspaceCandidates(java.util.List<Subspace> subspaces)
d+1-dimensional subspace candidates from the specified
 d-dimensional subspaces.subspaces - the d-dimensional subspacesd+1-dimensional subspace candidatesprivate boolean checkLower(Subspace candidate, java.util.List<Subspace> subspaces)
candidate - Current candidatesubspaces - Subspacestrue if all lower-dimensional subspaces existprivate Subspace bestSubspace(java.util.List<Subspace> subspaces, Subspace candidate, java.util.TreeMap<Subspace,java.util.List<Cluster<Model>>> clusterMap)
d-dimensional subspace of the (d+1)
 -dimensional candidate with minimal number of objects in the cluster.subspaces - the list of d-dimensional subspaces containing
        clusterscandidate - the (d+1)-dimensional candidate subspaceclusterMap - the mapping of subspaces to clustersd-dimensional subspace of the (d+1)
         -dimensional candidate with minimal number of objects in the
         clusterpublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithmgetInputTypeRestriction in interface AlgorithmgetInputTypeRestriction in class AbstractAlgorithm<Clustering<SubspaceModel>>protected Logging getLogger()
AbstractAlgorithmgetLogger in class AbstractAlgorithm<Clustering<SubspaceModel>>Copyright © 2019 ELKI Development Team. License information.