Package elki.clustering.em
Class KDTreeEM
- java.lang.Object
-
- elki.clustering.em.KDTreeEM
-
- All Implemented Interfaces:
Algorithm
,ClusteringAlgorithm<Clustering<EMModel>>
@Description("Gaussian mixture modeling accelerated using a kd-tree") @Reference(authors="Andrew W. Moore", booktitle="Advances in Neural Information Processing Systems 11 (NIPS 1998)", title="Very Fast EM-based Mixture Model Clustering using Multiresolution kd-trees", bibkey="DBLP:conf/nips/Moore98") public class KDTreeEM extends java.lang.Object implements ClusteringAlgorithm<Clustering<EMModel>>
Clustering by expectation maximization (EM-Algorithm), also known as Gaussian Mixture Modeling (GMM), calculated on a kd-tree. If supported, tries to prune during calculation.Reference:
A. W. Moore:
Very Fast EM-based Mixture Model Clustering using Multiresolution kd-trees.
Neural Information Processing Systems (NIPS 1998)- Since:
- 0.8.0
- Author:
- Robert Gehde
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description (package private) static class
KDTreeEM.KDTree
KDTree class with the statistics needed for EM clustering.static class
KDTreeEM.Par
Parameterization class.-
Nested classes/interfaces inherited from interface elki.Algorithm
Algorithm.Utils
-
-
Field Summary
Fields Modifier and Type Field Description private double
delta
Delta parameterprotected boolean
exactAssign
Perform exact cluster assignmentsprivate double
ipiPow
Gaussian scaling factor for likelihood.private int
k
number of modelsprivate static Logging
LOG
Logging objectprivate int
maxiter
maximum amount of iterationsprivate double
mbw
minimum leaf sizeprivate TextbookMultivariateGaussianModelFactory
mfactory
Factory for producing the initial cluster model.private int
miniter
minimum amount of iterationsprivate java.util.List<TextbookMultivariateGaussianModel>
models
Current clusters.private java.util.List<TextbookMultivariateGaussianModel>
newmodels
Models for next iteration.private boolean
soft
Retain soft assignments.static SimpleTypeInformation<double[]>
SOFT_TYPE
Soft assignment result type.private ConstrainedQuadraticProblemSolver
solver
Solver for quadratic problemsprotected ArrayModifiableDBIDs
sorted
kd-tree object orderprivate double
tau
tau, low for precise, high for fast results.private double
tauClass
Drop one class if the maximum weight of a class in the bounding box is lower than tauClass * wmin_max, where wmin_max is the maximum minimum weight of all classesprivate double[]
wsum
Cluster weights
-
Constructor Summary
Constructors Constructor Description KDTreeEM(int k, double mbw, double tau, double tauclass, double delta, TextbookMultivariateGaussianModelFactory mfactory, int miniter, int maxiter, boolean soft, boolean exactAssign)
Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private double[]
analyseDimWidth(Relation<? extends NumberVector> relation)
Helper method to retrieve the widths of all data in all dimensions.private void
calculateModelLimits(KDTreeEM.KDTree node, TextbookMultivariateGaussianModel model, double[] minpnt, double[] maxpnt, double[] ret)
Calculates the model limits inside this node by translating the Gaussian model into a squared function.private int[]
checkStoppingCondition(KDTreeEM.KDTree node, int[] indices)
This methods checks the different stopping conditions given in the paper, thus calculating the Dimensions, that will be considered for child-trees.TypeInformation[]
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.private double
makeStats(KDTreeEM.KDTree node, int[] indices, WritableDataStore<double[]> probs)
Calculates the statistics on the kd-tree needed for the calculation of the new modelsClustering<EMModel>
run(Relation<? extends NumberVector> relation)
Calculates the EM Clustering with the given values by calling makeStats and calculation the new models from the given results-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface elki.clustering.ClusteringAlgorithm
autorun
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
Logging object
-
mfactory
private TextbookMultivariateGaussianModelFactory mfactory
Factory for producing the initial cluster model.
-
soft
private boolean soft
Retain soft assignments.
-
delta
private double delta
Delta parameter
-
SOFT_TYPE
public static final SimpleTypeInformation<double[]> SOFT_TYPE
Soft assignment result type.
-
k
private int k
number of models
-
mbw
private double mbw
minimum leaf size
-
tau
private double tau
tau, low for precise, high for fast results.
-
tauClass
private double tauClass
Drop one class if the maximum weight of a class in the bounding box is lower than tauClass * wmin_max, where wmin_max is the maximum minimum weight of all classes
-
miniter
private int miniter
minimum amount of iterations
-
maxiter
private int maxiter
maximum amount of iterations
-
sorted
protected ArrayModifiableDBIDs sorted
kd-tree object order
-
models
private java.util.List<TextbookMultivariateGaussianModel> models
Current clusters.
-
newmodels
private java.util.List<TextbookMultivariateGaussianModel> newmodels
Models for next iteration.
-
solver
private ConstrainedQuadraticProblemSolver solver
Solver for quadratic problems
-
ipiPow
private double ipiPow
Gaussian scaling factor for likelihood.
-
wsum
private double[] wsum
Cluster weights
-
exactAssign
protected boolean exactAssign
Perform exact cluster assignments
-
-
Constructor Detail
-
KDTreeEM
public KDTreeEM(int k, double mbw, double tau, double tauclass, double delta, TextbookMultivariateGaussianModelFactory mfactory, int miniter, int maxiter, boolean soft, boolean exactAssign)
Constructor.- Parameters:
k
- number of classesmbw
- minimum relative size of leaf nodestau
- pruning parametertauclass
- pruning parameter for single classesdelta
- delta parametermfactory
- EM cluster model factoryminiter
- Minimum number of iterationsmaxiter
- Maximum number of iterationssoft
- Include soft assignmentsexactAssign
- Perform exact assignments at the end
-
-
Method Detail
-
run
public Clustering<EMModel> run(Relation<? extends NumberVector> relation)
Calculates the EM Clustering with the given values by calling makeStats and calculation the new models from the given results- Parameters:
relation
- Data Relation- Returns:
- Clustering KDTreeEM Clustering
-
analyseDimWidth
private double[] analyseDimWidth(Relation<? extends NumberVector> relation)
Helper method to retrieve the widths of all data in all dimensions.- Parameters:
relation
- Relation to analyze- Returns:
- width of each dimension
-
checkStoppingCondition
private int[] checkStoppingCondition(KDTreeEM.KDTree node, int[] indices)
This methods checks the different stopping conditions given in the paper, thus calculating the Dimensions, that will be considered for child-trees. If this method returns a non-empty subset of the input dimension set, it means that missing dimensions are dropped because their weight was too small. If it returns a null array it means that the expected error of all remaining models is small enough to consider this node a leaf node.- Parameters:
node
- kd tree nodeindices
- list of indices to check- Returns:
- indices that are not pruned, null if everything was pruned
-
calculateModelLimits
private void calculateModelLimits(KDTreeEM.KDTree node, TextbookMultivariateGaussianModel model, double[] minpnt, double[] maxpnt, double[] ret)
Calculates the model limits inside this node by translating the Gaussian model into a squared function.- Parameters:
model
- model to calculate the limits forminpnt
- result array for argminmaxpnt
- result array for argmaxret
- Return array
-
makeStats
private double makeStats(KDTreeEM.KDTree node, int[] indices, WritableDataStore<double[]> probs)
Calculates the statistics on the kd-tree needed for the calculation of the new models- Parameters:
node
- next nodeindices
- list of indices to use in calculation, initially allprobs
- cluster assignment- Returns:
- log likelihood of the model
-
getInputTypeRestriction
public TypeInformation[] getInputTypeRestriction()
Description copied from interface:Algorithm
Get the input type restriction used for negotiating the data query.- Specified by:
getInputTypeRestriction
in interfaceAlgorithm
- Returns:
- Type restriction
-
-