Package elki.clustering.subspace
Class DiSH
- java.lang.Object
-
- elki.clustering.subspace.DiSH
-
- All Implemented Interfaces:
Algorithm
,ClusteringAlgorithm<Clustering<SubspaceModel>>
,SubspaceClusteringAlgorithm<SubspaceModel>
@Title("DiSH: Detecting Subspace cluster Hierarchies") @Description("Algorithm to find hierarchical correlation clusters in subspaces.") @Reference(authors="E. Achtert, C. B\u00f6hm, H.-P. Kriegel, P. Kr\u00f6ger, I. M\u00fcller-Gorman, A. Zimek", title="Detection and Visualization of Subspace Cluster Hierarchies", booktitle="Proc. 12th Int. Conf. on Database Systems for Advanced Applications (DASFAA)", url="https://doi.org/10.1007/978-3-540-71703-4_15", bibkey="DBLP:conf/dasfaa/AchtertBKKMZ07") public class DiSH extends java.lang.Object implements SubspaceClusteringAlgorithm<SubspaceModel>
Algorithm for detecting subspace hierarchies.Reference:
E. Achtert, C. Böhm, H.-P. Kriegel, P. Kröger, I. Müller-Gorman, A. Zimek
Detection and Visualization of Subspace Cluster Hierarchies
Proc. 12th Int. Conf. on Database Systems for Advanced Applications (DASFAA).- Since:
- 0.1
- Author:
- Elke Achtert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
DiSH.DiSHClusterOrder
DiSH cluster order.private class
DiSH.Instance
OPTICS variant used by DiSH internally.static class
DiSH.Par
Parameterization class.static class
DiSH.Strategy
Available strategies for determination of the preference vector.-
Nested classes/interfaces inherited from interface elki.Algorithm
Algorithm.Utils
-
-
Field Summary
Fields Modifier and Type Field Description private double
epsilon
Holds the value ofDiSH.Par.EPSILON_ID
.private static Logging
LOG
The logger for this class.private int
minpts
OPTICS minPts parameter.private DiSH.Strategy
strategy
DiSH strategy.
-
Constructor Summary
Constructors Constructor Description DiSH(double epsilon, int minpts, DiSH.Strategy strategy)
Constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private void
buildHierarchy(Relation<? extends NumberVector> database, Clustering<SubspaceModel> clustering, java.util.List<Cluster<SubspaceModel>> clusters, int dimensionality)
Builds the cluster hierarchy.private void
checkClusters(Relation<? extends NumberVector> relation, it.unimi.dsi.fastutil.objects.Object2ObjectMap<long[],java.util.List<ArrayModifiableDBIDs>> clustersMap)
Removes the clusters with size < minpts from the cluster map and adds them to their parents.private Clustering<SubspaceModel>
computeClusters(Relation<? extends NumberVector> database, DiSH.DiSHClusterOrder clusterOrder)
Computes the hierarchical clusters according to the cluster order.private it.unimi.dsi.fastutil.objects.Object2ObjectOpenCustomHashMap<long[],java.util.List<ArrayModifiableDBIDs>>
extractClusters(Relation<? extends NumberVector> relation, DiSH.DiSHClusterOrder clusterOrder)
Extracts the clusters from the cluster order.private Pair<long[],ArrayModifiableDBIDs>
findParent(Relation<? extends NumberVector> relation, Pair<long[],ArrayModifiableDBIDs> child, it.unimi.dsi.fastutil.objects.Object2ObjectMap<long[],java.util.List<ArrayModifiableDBIDs>> clustersMap)
Returns the parent of the specified clusterTypeInformation[]
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.private boolean
isParent(Relation<? extends NumberVector> relation, Cluster<SubspaceModel> parent, It<Cluster<SubspaceModel>> iter, int db_dim)
Returns true, if the specified parent cluster is a parent of one child of the children clusters.private void
logClusterSizes(java.lang.String m, int dimensionality, it.unimi.dsi.fastutil.objects.Object2ObjectOpenCustomHashMap<long[],java.util.List<ArrayModifiableDBIDs>> clustersMap)
Log cluster sizes in verbose mode.Clustering<SubspaceModel>
run(Relation<? extends NumberVector> relation)
Performs the DiSH algorithm on the given database.private java.util.List<Cluster<SubspaceModel>>
sortClusters(Relation<? extends NumberVector> relation, it.unimi.dsi.fastutil.objects.Object2ObjectMap<long[],java.util.List<ArrayModifiableDBIDs>> clustersMap)
Returns a sorted list of the clusters w.r.t. the subspace dimensionality in descending order.private int
subspaceDimensionality(NumberVector v1, NumberVector v2, long[] pv1, long[] pv2, long[] commonPreferenceVector)
Compute the common subspace dimensionality of two vectors.protected static double
weightedDistance(NumberVector v1, NumberVector v2, long[] weightVector)
Computes the weighted distance between the two specified vectors according to the given preference vector.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface elki.clustering.ClusteringAlgorithm
autorun
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
The logger for this class.
-
epsilon
private double epsilon
Holds the value ofDiSH.Par.EPSILON_ID
.
-
minpts
private int minpts
OPTICS minPts parameter.
-
strategy
private DiSH.Strategy strategy
DiSH strategy.
-
-
Constructor Detail
-
DiSH
public DiSH(double epsilon, int minpts, DiSH.Strategy strategy)
Constructor.- Parameters:
epsilon
- Epsilon valueminpts
- Mu parameter (minPts)strategy
- DiSH strategy
-
-
Method Detail
-
getInputTypeRestriction
public TypeInformation[] getInputTypeRestriction()
Description copied from interface:Algorithm
Get the input type restriction used for negotiating the data query.- Specified by:
getInputTypeRestriction
in interfaceAlgorithm
- Returns:
- Type restriction
-
run
public Clustering<SubspaceModel> run(Relation<? extends NumberVector> relation)
Performs the DiSH algorithm on the given database.- Parameters:
relation
- Relation to process- Returns:
- Clustering
-
computeClusters
private Clustering<SubspaceModel> computeClusters(Relation<? extends NumberVector> database, DiSH.DiSHClusterOrder clusterOrder)
Computes the hierarchical clusters according to the cluster order.- Parameters:
database
- the database holding the objectsclusterOrder
- the cluster order
-
logClusterSizes
private void logClusterSizes(java.lang.String m, int dimensionality, it.unimi.dsi.fastutil.objects.Object2ObjectOpenCustomHashMap<long[],java.util.List<ArrayModifiableDBIDs>> clustersMap)
Log cluster sizes in verbose mode.- Parameters:
m
- Log messagedimensionality
- DimensionalityclustersMap
- Cluster map
-
extractClusters
private it.unimi.dsi.fastutil.objects.Object2ObjectOpenCustomHashMap<long[],java.util.List<ArrayModifiableDBIDs>> extractClusters(Relation<? extends NumberVector> relation, DiSH.DiSHClusterOrder clusterOrder)
Extracts the clusters from the cluster order.- Parameters:
relation
- the database storing the objectsclusterOrder
- the cluster order to extract the clusters from- Returns:
- the extracted clusters
-
sortClusters
private java.util.List<Cluster<SubspaceModel>> sortClusters(Relation<? extends NumberVector> relation, it.unimi.dsi.fastutil.objects.Object2ObjectMap<long[],java.util.List<ArrayModifiableDBIDs>> clustersMap)
Returns a sorted list of the clusters w.r.t. the subspace dimensionality in descending order.- Parameters:
relation
- the database storing the objectsclustersMap
- the mapping of bits sets to clusters- Returns:
- a sorted list of the clusters
-
checkClusters
private void checkClusters(Relation<? extends NumberVector> relation, it.unimi.dsi.fastutil.objects.Object2ObjectMap<long[],java.util.List<ArrayModifiableDBIDs>> clustersMap)
Removes the clusters with size < minpts from the cluster map and adds them to their parents.- Parameters:
relation
- the relation storing the objectsclustersMap
- the map containing the clusters
-
findParent
private Pair<long[],ArrayModifiableDBIDs> findParent(Relation<? extends NumberVector> relation, Pair<long[],ArrayModifiableDBIDs> child, it.unimi.dsi.fastutil.objects.Object2ObjectMap<long[],java.util.List<ArrayModifiableDBIDs>> clustersMap)
Returns the parent of the specified cluster- Parameters:
relation
- the relation storing the objectschild
- the child to search the parent forclustersMap
- the map containing the clusters- Returns:
- the parent of the specified cluster
-
buildHierarchy
private void buildHierarchy(Relation<? extends NumberVector> database, Clustering<SubspaceModel> clustering, java.util.List<Cluster<SubspaceModel>> clusters, int dimensionality)
Builds the cluster hierarchy.- Parameters:
clustering
- Clustering we processclusters
- the sorted list of clustersdimensionality
- the dimensionality of the datadatabase
- the database containing the data objects
-
isParent
private boolean isParent(Relation<? extends NumberVector> relation, Cluster<SubspaceModel> parent, It<Cluster<SubspaceModel>> iter, int db_dim)
Returns true, if the specified parent cluster is a parent of one child of the children clusters.- Parameters:
relation
- the database containing the objectsparent
- the parent to be testediter
- the list of children to be testeddb_dim
- Database dimensionality- Returns:
- true, if the specified parent cluster is a parent of one child of the children clusters, false otherwise
-
subspaceDimensionality
private int subspaceDimensionality(NumberVector v1, NumberVector v2, long[] pv1, long[] pv2, long[] commonPreferenceVector)
Compute the common subspace dimensionality of two vectors.- Parameters:
v1
- First vectorv2
- Second vectorpv1
- First preferencepv2
- Second preferencecommonPreferenceVector
- Common preference- Returns:
- Usually, v1.dim - commonPreference.cardinality, unless either pv1 and pv2 are a subset of the other.
-
weightedDistance
protected static double weightedDistance(NumberVector v1, NumberVector v2, long[] weightVector)
Computes the weighted distance between the two specified vectors according to the given preference vector.- Parameters:
v1
- the first vectorv2
- the second vectorweightVector
- the preference vector- Returns:
- the weighted distance between the two specified vectors according to the given preference vector
-
-