
V - Vector type@Title(value="CASH: Robust clustering in arbitrarily oriented subspaces") @Description(value="Subspace clustering algorithm based on the Hough transform.") @Reference(authors="E. Achtert, C. B\u00f6hm, J. David, P. Kr\u00f6ger, A. Zimek", title="Robust clustering in arbitraily oriented subspaces", booktitle="Proc. 8th SIAM Int. Conf. on Data Mining (SDM\'08), Atlanta, GA, 2008", url="http://www.siam.org/proceedings/datamining/2008/dm08_69_AchtertBoehmDavidKroegerZimek.pdf") public class CASH<V extends NumberVector> extends AbstractAlgorithm<Clustering<Model>> implements ClusteringAlgorithm<Clustering<Model>>
E. Achtert, C. Böhm, J. David, P. Kröger, A. Zimek:
Robust clustering in arbitrarily oriented subspaces.
In Proc. 8th SIAM Int. Conf. on Data Mining (SDM'08), Atlanta, GA, 2008
| Modifier and Type | Class and Description |
|---|---|
static class |
CASH.Parameterizer
Parameterization class.
|
| Modifier and Type | Field and Description |
|---|---|
protected boolean |
adjust
Apply adjustment heuristic for interval choosing.
|
private Relation<ParameterizationFunction> |
fulldatabase
The entire relation.
|
protected double |
jitter
Maximum jitter for distance values.
|
private static Logging |
LOG
The logger for this class.
|
protected int |
maxLevel
Maximum level for splitting the hypercube.
|
protected int |
minDim
Minimum dimensionality of the subspaces to be found
|
protected int |
minPts
Threshold for minimum number of points in a cluster
|
private int |
noiseDim
Holds the dimensionality for noise.
|
private ModifiableDBIDs |
processedIDs
Holds a set of processed ids.
|
| Constructor and Description |
|---|
CASH(int minPts,
int maxLevel,
int minDim,
double jitter,
boolean adjust)
Constructor.
|
| Modifier and Type | Method and Description |
|---|---|
private MaterializedRelation<ParameterizationFunction> |
buildDB(int dim,
Matrix basis,
DBIDs ids,
Relation<ParameterizationFunction> relation)
Builds a dim-1 dimensional database where the objects are projected into
the specified subspace.
|
private Database |
buildDerivatorDB(Relation<ParameterizationFunction> relation,
CASHInterval interval)
Builds a database for the derivator consisting of the ids in the specified
interval.
|
private Database |
buildDerivatorDB(Relation<ParameterizationFunction> relation,
DBIDs ids)
Builds a database for the derivator consisting of the ids in the specified
interval.
|
private Matrix |
determineBasis(double[] alpha)
Determines a basis defining a subspace described by the specified alpha
values.
|
private double[] |
determineMinMaxDistance(Relation<ParameterizationFunction> relation,
int dimensionality)
Determines the minimum and maximum function value of all parameterization
functions stored in the specified database.
|
private CASHInterval |
determineNextIntervalAtMaxLevel(ObjectHeap<IntegerPriorityObject<CASHInterval>> heap)
Determines the next ''best'' interval at maximum level, i.e. the next
interval containing the most unprocessed objects.
|
private static int |
dimensionality(Relation<ParameterizationFunction> relation)
Get the dimensionality of a vector field.
|
private CASHInterval |
doDetermineNextIntervalAtMaxLevel(ObjectHeap<IntegerPriorityObject<CASHInterval>> heap)
Recursive helper method to determine the next ''best'' interval at maximum
level, i.e. the next interval containing the most unprocessed objects
|
private Clustering<Model> |
doRun(Relation<ParameterizationFunction> relation,
FiniteProgress progress)
Runs the CASH algorithm on the specified database, this method is
recursively called until only noise is left.
|
TypeInformation[] |
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.
|
protected Logging |
getLogger()
Get the (STATIC) logger for this class.
|
private void |
initHeap(ObjectHeap<IntegerPriorityObject<CASHInterval>> heap,
Relation<ParameterizationFunction> relation,
int dim,
DBIDs ids)
Initializes the heap with the root intervals.
|
private Relation<ParameterizationFunction> |
preprocess(Database db,
Relation<V> vrel)
Preprocess the dataset, precomputing the parameterization functions.
|
private ParameterizationFunction |
project(Matrix basis,
ParameterizationFunction f)
Projects the specified parameterization function into the subspace
described by the given basis.
|
Clustering<Model> |
run(Database database,
Relation<V> vrel)
Run CASH on the relation.
|
private Matrix |
runDerivator(Relation<ParameterizationFunction> relation,
int dim,
CASHInterval interval,
ModifiableDBIDs ids)
Runs the derivator on the specified interval and assigns all points having
a distance less then the standard deviation of the derivator model to the
model to this model.
|
private LinearEquationSystem |
runDerivator(Relation<ParameterizationFunction> relation,
int dimensionality,
DBIDs ids)
Runs the derivator on the specified interval and assigns all points having
a distance less then the standard deviation of the derivator model to the
model to this model.
|
private double |
sinusProduct(int start,
int end,
double[] alpha)
Computes the product of all sinus values of the specified angles from start
to end index.
|
makeParameterDistanceFunction, runclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitrunprivate static final Logging LOG
protected int minPts
protected int maxLevel
protected int minDim
protected double jitter
protected boolean adjust
private int noiseDim
private ModifiableDBIDs processedIDs
private Relation<ParameterizationFunction> fulldatabase
public CASH(int minPts,
int maxLevel,
int minDim,
double jitter,
boolean adjust)
minPts - MinPts parametermaxLevel - Maximum levelminDim - Minimum dimensionalityjitter - Jitteradjust - Adjustpublic Clustering<Model> run(Database database, Relation<V> vrel)
database - Databasevrel - Relationprivate Relation<ParameterizationFunction> preprocess(Database db, Relation<V> vrel)
db - Databasevrel - Vector relationprivate Clustering<Model> doRun(Relation<ParameterizationFunction> relation, FiniteProgress progress)
relation - the Relation to run the CASH algorithm onprogress - the progress object for verbose messagesprivate static int dimensionality(Relation<ParameterizationFunction> relation)
relation - Relationprivate void initHeap(ObjectHeap<IntegerPriorityObject<CASHInterval>> heap, Relation<ParameterizationFunction> relation, int dim, DBIDs ids)
heap - the heap to be initializedrelation - the database storing the parameterization functionsdim - the dimensionality of the databaseids - the ids of the databaseprivate MaterializedRelation<ParameterizationFunction> buildDB(int dim, Matrix basis, DBIDs ids, Relation<ParameterizationFunction> relation)
dim - the dimensionality of the databasebasis - the basis defining the subspaceids - the ids for the new databaserelation - the database storing the parameterization functionsprivate ParameterizationFunction project(Matrix basis, ParameterizationFunction f)
basis - the basis defining he subspacef - the parameterization function to be projectedprivate Matrix determineBasis(double[] alpha)
alpha - the alpha valuesprivate double sinusProduct(int start,
int end,
double[] alpha)
start - the index to startend - the index to endalpha - the array of anglesprivate CASHInterval determineNextIntervalAtMaxLevel(ObjectHeap<IntegerPriorityObject<CASHInterval>> heap)
heap - the heap storing the intervalsprivate CASHInterval doDetermineNextIntervalAtMaxLevel(ObjectHeap<IntegerPriorityObject<CASHInterval>> heap)
heap - the heap storing the intervalsprivate double[] determineMinMaxDistance(Relation<ParameterizationFunction> relation, int dimensionality)
relation - the database containing the parameterization functions.dimensionality - the dimensionality of the databaseprivate Matrix runDerivator(Relation<ParameterizationFunction> relation, int dim, CASHInterval interval, ModifiableDBIDs ids)
relation - the database containing the parameterization functionsinterval - the interval to build the modeldim - the dimensionality of the databaseids - an empty set to assign the idsprivate Database buildDerivatorDB(Relation<ParameterizationFunction> relation, CASHInterval interval)
relation - the database storing the parameterization functionsinterval - the interval to build the database fromprivate LinearEquationSystem runDerivator(Relation<ParameterizationFunction> relation, int dimensionality, DBIDs ids)
relation - the database containing the parameterization functionsids - the ids to build the modeldimensionality - the dimensionality of the subspaceprivate Database buildDerivatorDB(Relation<ParameterizationFunction> relation, DBIDs ids)
relation - the database storing the parameterization functionsids - the ids to build the database frompublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithmgetInputTypeRestriction in interface AlgorithmgetInputTypeRestriction in class AbstractAlgorithm<Clustering<Model>>protected Logging getLogger()
AbstractAlgorithmgetLogger in class AbstractAlgorithm<Clustering<Model>>Copyright © 2015 ELKI Development Team, Lehr- und Forschungseinheit für Datenbanksysteme, Ludwig-Maximilians-Universität München. License information.