Package elki.clustering.subspace
Class DiSH
- java.lang.Object
-
- elki.clustering.subspace.DiSH
-
- All Implemented Interfaces:
Algorithm,ClusteringAlgorithm<Clustering<SubspaceModel>>,SubspaceClusteringAlgorithm<SubspaceModel>
@Title("DiSH: Detecting Subspace cluster Hierarchies") @Description("Algorithm to find hierarchical correlation clusters in subspaces.") @Reference(authors="E. Achtert, C. B\u00f6hm, H.-P. Kriegel, P. Kr\u00f6ger, I. M\u00fcller-Gorman, A. Zimek", title="Detection and Visualization of Subspace Cluster Hierarchies", booktitle="Proc. 12th Int. Conf. on Database Systems for Advanced Applications (DASFAA)", url="https://doi.org/10.1007/978-3-540-71703-4_15", bibkey="DBLP:conf/dasfaa/AchtertBKKMZ07") public class DiSH extends java.lang.Object implements SubspaceClusteringAlgorithm<SubspaceModel>
Algorithm for detecting subspace hierarchies.Reference:
E. Achtert, C. Böhm, H.-P. Kriegel, P. Kröger, I. Müller-Gorman, A. Zimek
Detection and Visualization of Subspace Cluster Hierarchies
Proc. 12th Int. Conf. on Database Systems for Advanced Applications (DASFAA).- Since:
- 0.1
- Author:
- Elke Achtert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classDiSH.DiSHClusterOrderDiSH cluster order.private classDiSH.InstanceOPTICS variant used by DiSH internally.static classDiSH.ParParameterization class.static classDiSH.StrategyAvailable strategies for determination of the preference vector.-
Nested classes/interfaces inherited from interface elki.Algorithm
Algorithm.Utils
-
-
Field Summary
Fields Modifier and Type Field Description private doubleepsilonHolds the value ofDiSH.Par.EPSILON_ID.private static LoggingLOGThe logger for this class.private intminptsOPTICS minPts parameter.private DiSH.StrategystrategyDiSH strategy.
-
Constructor Summary
Constructors Constructor Description DiSH(double epsilon, int minpts, DiSH.Strategy strategy)Constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private voidbuildHierarchy(Relation<? extends NumberVector> database, Clustering<SubspaceModel> clustering, java.util.List<Cluster<SubspaceModel>> clusters, int dimensionality)Builds the cluster hierarchy.private voidcheckClusters(Relation<? extends NumberVector> relation, it.unimi.dsi.fastutil.objects.Object2ObjectMap<long[],java.util.List<ArrayModifiableDBIDs>> clustersMap)Removes the clusters with size < minpts from the cluster map and adds them to their parents.private Clustering<SubspaceModel>computeClusters(Relation<? extends NumberVector> database, DiSH.DiSHClusterOrder clusterOrder)Computes the hierarchical clusters according to the cluster order.private it.unimi.dsi.fastutil.objects.Object2ObjectOpenCustomHashMap<long[],java.util.List<ArrayModifiableDBIDs>>extractClusters(Relation<? extends NumberVector> relation, DiSH.DiSHClusterOrder clusterOrder)Extracts the clusters from the cluster order.private Pair<long[],ArrayModifiableDBIDs>findParent(Relation<? extends NumberVector> relation, Pair<long[],ArrayModifiableDBIDs> child, it.unimi.dsi.fastutil.objects.Object2ObjectMap<long[],java.util.List<ArrayModifiableDBIDs>> clustersMap)Returns the parent of the specified clusterTypeInformation[]getInputTypeRestriction()Get the input type restriction used for negotiating the data query.private booleanisParent(Relation<? extends NumberVector> relation, Cluster<SubspaceModel> parent, It<Cluster<SubspaceModel>> iter, int db_dim)Returns true, if the specified parent cluster is a parent of one child of the children clusters.private voidlogClusterSizes(java.lang.String m, int dimensionality, it.unimi.dsi.fastutil.objects.Object2ObjectOpenCustomHashMap<long[],java.util.List<ArrayModifiableDBIDs>> clustersMap)Log cluster sizes in verbose mode.Clustering<SubspaceModel>run(Relation<? extends NumberVector> relation)Performs the DiSH algorithm on the given database.private java.util.List<Cluster<SubspaceModel>>sortClusters(Relation<? extends NumberVector> relation, it.unimi.dsi.fastutil.objects.Object2ObjectMap<long[],java.util.List<ArrayModifiableDBIDs>> clustersMap)Returns a sorted list of the clusters w.r.t. the subspace dimensionality in descending order.private intsubspaceDimensionality(NumberVector v1, NumberVector v2, long[] pv1, long[] pv2, long[] commonPreferenceVector)Compute the common subspace dimensionality of two vectors.protected static doubleweightedDistance(NumberVector v1, NumberVector v2, long[] weightVector)Computes the weighted distance between the two specified vectors according to the given preference vector.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface elki.clustering.ClusteringAlgorithm
autorun
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
The logger for this class.
-
epsilon
private double epsilon
Holds the value ofDiSH.Par.EPSILON_ID.
-
minpts
private int minpts
OPTICS minPts parameter.
-
strategy
private DiSH.Strategy strategy
DiSH strategy.
-
-
Constructor Detail
-
DiSH
public DiSH(double epsilon, int minpts, DiSH.Strategy strategy)Constructor.- Parameters:
epsilon- Epsilon valueminpts- Mu parameter (minPts)strategy- DiSH strategy
-
-
Method Detail
-
getInputTypeRestriction
public TypeInformation[] getInputTypeRestriction()
Description copied from interface:AlgorithmGet the input type restriction used for negotiating the data query.- Specified by:
getInputTypeRestrictionin interfaceAlgorithm- Returns:
- Type restriction
-
run
public Clustering<SubspaceModel> run(Relation<? extends NumberVector> relation)
Performs the DiSH algorithm on the given database.- Parameters:
relation- Relation to process- Returns:
- Clustering
-
computeClusters
private Clustering<SubspaceModel> computeClusters(Relation<? extends NumberVector> database, DiSH.DiSHClusterOrder clusterOrder)
Computes the hierarchical clusters according to the cluster order.- Parameters:
database- the database holding the objectsclusterOrder- the cluster order
-
logClusterSizes
private void logClusterSizes(java.lang.String m, int dimensionality, it.unimi.dsi.fastutil.objects.Object2ObjectOpenCustomHashMap<long[],java.util.List<ArrayModifiableDBIDs>> clustersMap)Log cluster sizes in verbose mode.- Parameters:
m- Log messagedimensionality- DimensionalityclustersMap- Cluster map
-
extractClusters
private it.unimi.dsi.fastutil.objects.Object2ObjectOpenCustomHashMap<long[],java.util.List<ArrayModifiableDBIDs>> extractClusters(Relation<? extends NumberVector> relation, DiSH.DiSHClusterOrder clusterOrder)
Extracts the clusters from the cluster order.- Parameters:
relation- the database storing the objectsclusterOrder- the cluster order to extract the clusters from- Returns:
- the extracted clusters
-
sortClusters
private java.util.List<Cluster<SubspaceModel>> sortClusters(Relation<? extends NumberVector> relation, it.unimi.dsi.fastutil.objects.Object2ObjectMap<long[],java.util.List<ArrayModifiableDBIDs>> clustersMap)
Returns a sorted list of the clusters w.r.t. the subspace dimensionality in descending order.- Parameters:
relation- the database storing the objectsclustersMap- the mapping of bits sets to clusters- Returns:
- a sorted list of the clusters
-
checkClusters
private void checkClusters(Relation<? extends NumberVector> relation, it.unimi.dsi.fastutil.objects.Object2ObjectMap<long[],java.util.List<ArrayModifiableDBIDs>> clustersMap)
Removes the clusters with size < minpts from the cluster map and adds them to their parents.- Parameters:
relation- the relation storing the objectsclustersMap- the map containing the clusters
-
findParent
private Pair<long[],ArrayModifiableDBIDs> findParent(Relation<? extends NumberVector> relation, Pair<long[],ArrayModifiableDBIDs> child, it.unimi.dsi.fastutil.objects.Object2ObjectMap<long[],java.util.List<ArrayModifiableDBIDs>> clustersMap)
Returns the parent of the specified cluster- Parameters:
relation- the relation storing the objectschild- the child to search the parent forclustersMap- the map containing the clusters- Returns:
- the parent of the specified cluster
-
buildHierarchy
private void buildHierarchy(Relation<? extends NumberVector> database, Clustering<SubspaceModel> clustering, java.util.List<Cluster<SubspaceModel>> clusters, int dimensionality)
Builds the cluster hierarchy.- Parameters:
clustering- Clustering we processclusters- the sorted list of clustersdimensionality- the dimensionality of the datadatabase- the database containing the data objects
-
isParent
private boolean isParent(Relation<? extends NumberVector> relation, Cluster<SubspaceModel> parent, It<Cluster<SubspaceModel>> iter, int db_dim)
Returns true, if the specified parent cluster is a parent of one child of the children clusters.- Parameters:
relation- the database containing the objectsparent- the parent to be testediter- the list of children to be testeddb_dim- Database dimensionality- Returns:
- true, if the specified parent cluster is a parent of one child of the children clusters, false otherwise
-
subspaceDimensionality
private int subspaceDimensionality(NumberVector v1, NumberVector v2, long[] pv1, long[] pv2, long[] commonPreferenceVector)
Compute the common subspace dimensionality of two vectors.- Parameters:
v1- First vectorv2- Second vectorpv1- First preferencepv2- Second preferencecommonPreferenceVector- Common preference- Returns:
- Usually, v1.dim - commonPreference.cardinality, unless either pv1 and pv2 are a subset of the other.
-
weightedDistance
protected static double weightedDistance(NumberVector v1, NumberVector v2, long[] weightVector)
Computes the weighted distance between the two specified vectors according to the given preference vector.- Parameters:
v1- the first vectorv2- the second vectorweightVector- the preference vector- Returns:
- the weighted distance between the two specified vectors according to the given preference vector
-
-