Class SUBCLU<V extends NumberVector>
- java.lang.Object
-
- elki.clustering.subspace.SUBCLU<V>
-
- Type Parameters:
V
- the type of NumberVector handled by this algorithm
- All Implemented Interfaces:
Algorithm
,ClusteringAlgorithm<Clustering<SubspaceModel>>
,SubspaceClusteringAlgorithm<SubspaceModel>
@Title("SUBCLU: Density connected Subspace Clustering") @Description("Algorithm to detect arbitrarily shaped and positioned clusters in subspaces. SUBCLU delivers for each subspace the same clusters DBSCAN would have found, when applied to this subspace seperately.") @Reference(authors="Karin Kailing, Hans-Peter Kriegel, Peer Kr\u00f6ger", title="Density Connected Subspace Clustering for High Dimensional Data", booktitle="Proc. SIAM Int. Conf. on Data Mining (SDM\'04)", url="https://doi.org/10.1137/1.9781611972740.23", bibkey="DBLP:conf/sdm/KroegerKK04") public class SUBCLU<V extends NumberVector> extends java.lang.Object implements SubspaceClusteringAlgorithm<SubspaceModel>
Implementation of the SUBCLU algorithm, an algorithm to detect arbitrarily shaped and positioned clusters in subspaces. SUBCLU delivers for each subspace the same clusters DBSCAN would have found, when applied to this subspace separately.Note: this implementation does not yet implemented the suggested indexing based on inverted files, and that we also only query subsets of them, this makes the implementation as a reusable index challenging.
The original paper is not very clear on which clusters to return, as any subspace cluster must be part of a lower-dimensional projected cluster, so these results would be highly redundant. In this implementation, we only include points in clusters that are not already part of sub-clusters (note that this does not remove overlap of independent subspaces).
Reference:
Karin Kailing, Hans-Peter Kriegel, Peer Kröger
Density Connected Subspace Clustering for High Dimensional Data
Proc. SIAM Int. Conf. on Data Mining (SDM'04)- Since:
- 0.3
- Author:
- Elke Achtert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
SUBCLU.Par<V extends NumberVector>
Parameterization class.-
Nested classes/interfaces inherited from interface elki.Algorithm
Algorithm.Utils
-
-
Field Summary
Fields Modifier and Type Field Description protected DimensionSelectingSubspaceDistance<V>
distance
The distance function to determine the distance between objects.protected double
epsilon
Maximum radius of the neighborhood to be considered.private static Logging
LOG
The logger for this class.protected int
mindim
Minimum dimensionality.protected int
minpts
Minimum number of points.
-
Constructor Summary
Constructors Constructor Description SUBCLU(DimensionSelectingSubspaceDistance<V> distance, double epsilon, int minpts, int mindim)
Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private Subspace
bestSubspace(java.util.List<Subspace> subspaces, Subspace candidate, java.util.TreeMap<Subspace,java.util.List<Cluster<Model>>> clusterMap)
Determines thed
-dimensional subspace of the(d+1)
-dimensional candidate with minimal number of objects in the cluster.private boolean
checkLower(Subspace candidate, java.util.List<Subspace> subspaces)
Perform Apriori-style pruning.private java.util.List<Subspace>
generateSubspaceCandidates(java.util.List<Subspace> subspaces)
Generatesd+1
-dimensional subspace candidates from the specifiedd
-dimensional subspaces.TypeInformation[]
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.Clustering<SubspaceModel>
run(Relation<V> relation)
Performs the SUBCLU algorithm on the given database.private java.util.List<Cluster<Model>>
runDBSCAN(Relation<V> relation, DBIDs ids, Subspace subspace)
Runs the DBSCAN algorithm on the specified partition of the database in the given subspace.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface elki.clustering.ClusteringAlgorithm
autorun
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
The logger for this class.
-
distance
protected DimensionSelectingSubspaceDistance<V extends NumberVector> distance
The distance function to determine the distance between objects.
-
epsilon
protected double epsilon
Maximum radius of the neighborhood to be considered.
-
minpts
protected int minpts
Minimum number of points.
-
mindim
protected int mindim
Minimum dimensionality.
-
-
Constructor Detail
-
SUBCLU
public SUBCLU(DimensionSelectingSubspaceDistance<V> distance, double epsilon, int minpts, int mindim)
Constructor.- Parameters:
distance
- Distance functionepsilon
- Epsilon valueminpts
- Minpts valuemindim
- Minimum dimensionality
-
-
Method Detail
-
getInputTypeRestriction
public TypeInformation[] getInputTypeRestriction()
Description copied from interface:Algorithm
Get the input type restriction used for negotiating the data query.- Specified by:
getInputTypeRestriction
in interfaceAlgorithm
- Returns:
- Type restriction
-
run
public Clustering<SubspaceModel> run(Relation<V> relation)
Performs the SUBCLU algorithm on the given database.- Parameters:
relation
- Relation to process- Returns:
- Clustering result
-
runDBSCAN
private java.util.List<Cluster<Model>> runDBSCAN(Relation<V> relation, DBIDs ids, Subspace subspace)
Runs the DBSCAN algorithm on the specified partition of the database in the given subspace. If parameterids
is null DBSCAN will be applied to the whole database.- Parameters:
relation
- the database holding the objects to run DBSCAN onids
- the IDs of the database defining the partition to run DBSCAN on - if this parameter is null DBSCAN will be applied to the whole databasesubspace
- the subspace to run DBSCAN on- Returns:
- the clustering result of the DBSCAN run
-
generateSubspaceCandidates
private java.util.List<Subspace> generateSubspaceCandidates(java.util.List<Subspace> subspaces)
Generatesd+1
-dimensional subspace candidates from the specifiedd
-dimensional subspaces.- Parameters:
subspaces
- thed
-dimensional subspaces- Returns:
- the
d+1
-dimensional subspace candidates
-
checkLower
private boolean checkLower(Subspace candidate, java.util.List<Subspace> subspaces)
Perform Apriori-style pruning.- Parameters:
candidate
- Current candidatesubspaces
- Subspaces- Returns:
true
if all lower-dimensional subspaces exist
-
bestSubspace
private Subspace bestSubspace(java.util.List<Subspace> subspaces, Subspace candidate, java.util.TreeMap<Subspace,java.util.List<Cluster<Model>>> clusterMap)
Determines thed
-dimensional subspace of the(d+1)
-dimensional candidate with minimal number of objects in the cluster.- Parameters:
subspaces
- the list ofd
-dimensional subspaces containing clusterscandidate
- the(d+1)
-dimensional candidate subspaceclusterMap
- the mapping of subspaces to clusters- Returns:
- the
d
-dimensional subspace of the(d+1)
-dimensional candidate with minimal number of objects in the cluster
-
-