Class LMCLUS
- java.lang.Object
-
- elki.clustering.correlation.LMCLUS
-
- All Implemented Interfaces:
Algorithm
,ClusteringAlgorithm<Clustering<Model>>
@Reference(authors="R. Haralick, R. Harpaz", title="Linear manifold clustering in high dimensional spaces by stochastic search", booktitle="Pattern Recognition volume 40, Issue 10", url="https://doi.org/10.1016/j.patcog.2007.01.020", bibkey="DBLP:journals/pr/HaralickH07") public class LMCLUS extends java.lang.Object implements ClusteringAlgorithm<Clustering<Model>>
Linear manifold clustering in high dimensional spaces by stochastic search.Reference:
R. Haralick, R. Harpaz
Linear manifold clustering in high dimensional spaces by stochastic search
In: Pattern Recognition volume 40, Issue 10Implementation note: the LMCLUS algorithm seems to lack good stopping criterions. We can't entirely reproduce the good results from the original publication, in particular not on noisy data. But the questionable parts are as in the original publication, associated thesis and published source code. The minimum cluster size however can serve as a hidden stopping criterion.
- Since:
- 0.5.0
- Author:
- Ernst Waas, Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
LMCLUS.Par
Parameterization classprivate static class
LMCLUS.Separation
Class to represent a linear manifold separation-
Nested classes/interfaces inherited from interface elki.Algorithm
Algorithm.Utils
-
-
Field Summary
Fields Modifier and Type Field Description private static int
BINS
Histogram resolutionprivate static Logging
LOG
The logger for this class.private static double
LOG_NOT_FROM_ONE_CLUSTER_PROBABILITY
Epsilonprivate int
maxLMDim
Maximum cluster dimensionalityprivate int
minsize
Minimum cluster sizeprivate RandomFactory
rnd
Random factoryprivate int
samplingLevel
Number of sampling rounds to find a good splitprivate double
sensitivityThreshold
The current threshold value calculated by the findSeperation Method.
-
Constructor Summary
Constructors Constructor Description LMCLUS(int maxdim, int minsize, int samplingLevel, double sensitivityThreshold, RandomFactory rnd)
Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private double
deviation(double[] delta, double[][] beta)
Deviation from a manifold described by beta.private double[]
findAndEvaluateThreshold(DoubleDynamicHistogram histogram)
Evaluate the histogram to find a suitable thresholdprivate LMCLUS.Separation
findSeparation(Relation<? extends NumberVector> relation, DBIDs currentids, int dimension, java.util.Random r)
This method samples a number of linear manifolds an tries to determine which the one with the best cluster is.private double[][]
generateOrthonormalBasis(java.util.List<double[]> vectors)
This Method generates an orthonormal basis from a set of Vectors.TypeInformation[]
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.Clustering<Model>
run(Relation<? extends NumberVector> relation)
The main LMCLUS (Linear manifold clustering algorithm) is processed in this method.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface elki.clustering.ClusteringAlgorithm
autorun
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
The logger for this class.
-
LOG_NOT_FROM_ONE_CLUSTER_PROBABILITY
private static final double LOG_NOT_FROM_ONE_CLUSTER_PROBABILITY
Epsilon
-
BINS
private static final int BINS
Histogram resolution- See Also:
- Constant Field Values
-
sensitivityThreshold
private final double sensitivityThreshold
The current threshold value calculated by the findSeperation Method.
-
maxLMDim
private final int maxLMDim
Maximum cluster dimensionality
-
minsize
private final int minsize
Minimum cluster size
-
samplingLevel
private final int samplingLevel
Number of sampling rounds to find a good split
-
rnd
private final RandomFactory rnd
Random factory
-
-
Constructor Detail
-
LMCLUS
public LMCLUS(int maxdim, int minsize, int samplingLevel, double sensitivityThreshold, RandomFactory rnd)
Constructor.- Parameters:
maxdim
- Maximum dimensionalityminsize
- Minimum cluster sizesamplingLevel
- Sampling levelsensitivityThreshold
- Thresholdrnd
- Random factory
-
-
Method Detail
-
getInputTypeRestriction
public TypeInformation[] getInputTypeRestriction()
Description copied from interface:Algorithm
Get the input type restriction used for negotiating the data query.- Specified by:
getInputTypeRestriction
in interfaceAlgorithm
- Returns:
- Type restriction
-
run
public Clustering<Model> run(Relation<? extends NumberVector> relation)
The main LMCLUS (Linear manifold clustering algorithm) is processed in this method.The algorithm samples random linear manifolds and tries to find clusters in it. It calculates a distance histogram searches for a threshold and partitions the points in two groups the ones in the cluster and everything else. Then the best fitting linear manifold is searched and registered as a cluster. The process is started over until all points are clustered. The last cluster should contain all the outliers (or the whole data if no clusters have been found.)
- Parameters:
relation
- Relation- Returns:
- Clustering result
-
deviation
private double deviation(double[] delta, double[][] beta)
Deviation from a manifold described by beta.- Parameters:
delta
- Delta from origin vectorbeta
- Manifold- Returns:
- Deviation score
-
findSeparation
private LMCLUS.Separation findSeparation(Relation<? extends NumberVector> relation, DBIDs currentids, int dimension, java.util.Random r)
This method samples a number of linear manifolds an tries to determine which the one with the best cluster is.A number of sample points according to the dimension of the linear manifold are taken. The basis (B) and the origin(o) of the manifold are calculated. A distance histogram using the distance function ||x-o|| -||B^t*(x-o)|| is generated. The best threshold is searched using the elevate threshold function. The overall goodness of the threshold is determined. The process is redone until a specific number of samples is taken.
- Parameters:
relation
- The vector relationcurrentids
- Current DBIDsdimension
- the dimension of the linear manifold to sample.r
- Random generator- Returns:
- the overall goodness of the separation. The values origin basis and threshold are returned indirectly over class variables.
-
generateOrthonormalBasis
private double[][] generateOrthonormalBasis(java.util.List<double[]> vectors)
This Method generates an orthonormal basis from a set of Vectors. It uses the established Gram-Schmidt algorithm for orthonormalisation:u_1 = v_1 u_k = v_k -proj_u1(v_k)...proj_u(k-1)(v_k); Where proj_u(v) = <v,u>/<u,u> *u
- Parameters:
vectors
- The set of vectors to generate the orthonormal basis from- Returns:
- the orthonormal basis generated by this method.
- Throws:
java.lang.RuntimeException
- if the given vectors are not linear independent.
-
findAndEvaluateThreshold
private double[] findAndEvaluateThreshold(DoubleDynamicHistogram histogram)
Evaluate the histogram to find a suitable threshold- Parameters:
histogram
- Histogram to evaluate- Returns:
- Position and goodness
-
-