Package elki.clustering.em
Class BetulaGMM
- java.lang.Object
-
- elki.clustering.em.BetulaGMM
-
- All Implemented Interfaces:
Algorithm
,ClusteringAlgorithm<Clustering<EMModel>>
- Direct Known Subclasses:
BetulaGMMWeighted
@Reference(authors="Andreas Lang and Erich Schubert", title="BETULA: Fast Clustering of Large Data with Improved BIRCH CF-Trees", booktitle="Information Systems", url="https://doi.org/10.1016/j.is.2021.101918", bibkey="DBLP:journals/is/LangS22") public class BetulaGMM extends java.lang.Object implements ClusteringAlgorithm<Clustering<EMModel>>
Clustering by expectation maximization (EM-Algorithm), also known as Gaussian Mixture Modeling (GMM), with optional MAP regularization. This version uses the BIRCH cluster feature centers only for responsibility estimation; the CF variances are only used for computing the models.Reference:
Andreas Lang and Erich Schubert
BETULA: Fast Clustering of Large Data with Improved BIRCH CF-Trees
Information Systems- Since:
- 0.8.0
- Author:
- Andreas Lang
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
BetulaGMM.Par
Parameterizer-
Nested classes/interfaces inherited from interface elki.Algorithm
Algorithm.Utils
-
-
Field Summary
Fields Modifier and Type Field Description (package private) CFTree.Factory<?>
cffactory
CFTree factory.private double
delta
Delta parameter(package private) BetulaClusterModelFactory<?>
initializer
Maximum number of iterations.(package private) int
k
Number of cluster centers to initialize.private static Logging
LOG
Class logger.(package private) int
maxiter
Maximum number of iterations.protected static double
MIN_LOGLIKELIHOOD
Minimum loglikelihood to avoid -infinity.private double
prior
Prior to enable MAP estimation (use 0 for MLE)private boolean
soft
Retain soft assignments.static SimpleTypeInformation<double[]>
SOFT_TYPE
Soft assignment result type.
-
Constructor Summary
Constructors Constructor Description BetulaGMM(CFTree.Factory<?> cffactory, double delta, int k, int maxiter, boolean soft, BetulaClusterModelFactory<?> initialization, double prior)
Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description double
assignProbabilitiesToInstances(Relation<? extends NumberVector> relation, java.util.List<? extends BetulaClusterModel> models, WritableDataStore<double[]> probClusterIGivenX)
Assigns the current probability values to the instances in the database and compute the expectation value of the current mixture of distributions.double
assignProbabilitiesToInstances(java.util.ArrayList<? extends ClusterFeature> cfs, java.util.List<? extends BetulaClusterModel> models, java.util.Map<ClusterFeature,double[]> probClusterIGivenX)
Assigns the current probability values to the instances in the database and compute the expectation value of the current mixture of distributions.TypeInformation[]
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.private boolean
isSoft()
void
recomputeCovarianceMatrices(java.util.ArrayList<? extends ClusterFeature> cfs, java.util.Map<ClusterFeature,double[]> probClusterIGivenX, java.util.List<? extends BetulaClusterModel> models, double prior, int n)
Recompute the covariance matrixes.Clustering<EMModel>
run(Relation<NumberVector> relation)
Run the clustering algorithm.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface elki.clustering.ClusteringAlgorithm
autorun
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
Class logger.
-
cffactory
CFTree.Factory<?> cffactory
CFTree factory.
-
k
int k
Number of cluster centers to initialize.
-
delta
private double delta
Delta parameter
-
maxiter
int maxiter
Maximum number of iterations.
-
prior
private double prior
Prior to enable MAP estimation (use 0 for MLE)
-
soft
private boolean soft
Retain soft assignments.
-
MIN_LOGLIKELIHOOD
protected static final double MIN_LOGLIKELIHOOD
Minimum loglikelihood to avoid -infinity.- See Also:
- Constant Field Values
-
SOFT_TYPE
public static final SimpleTypeInformation<double[]> SOFT_TYPE
Soft assignment result type.
-
initializer
BetulaClusterModelFactory<?> initializer
Maximum number of iterations.
-
-
Constructor Detail
-
BetulaGMM
public BetulaGMM(CFTree.Factory<?> cffactory, double delta, int k, int maxiter, boolean soft, BetulaClusterModelFactory<?> initialization, double prior)
Constructor.- Parameters:
cffactory
- CFTree factoryk
- Number of clustersmaxiter
- Maximum number of iterationssoft
- Return soft clustering resultsinitialization
- Initialization methodprior
- MAP prior
-
-
Method Detail
-
getInputTypeRestriction
public TypeInformation[] getInputTypeRestriction()
Description copied from interface:Algorithm
Get the input type restriction used for negotiating the data query.- Specified by:
getInputTypeRestriction
in interfaceAlgorithm
- Returns:
- Type restriction
-
run
public Clustering<EMModel> run(Relation<NumberVector> relation)
Run the clustering algorithm.- Parameters:
relation
- Input data- Returns:
- Clustering
-
isSoft
private boolean isSoft()
-
assignProbabilitiesToInstances
public double assignProbabilitiesToInstances(java.util.ArrayList<? extends ClusterFeature> cfs, java.util.List<? extends BetulaClusterModel> models, java.util.Map<ClusterFeature,double[]> probClusterIGivenX)
Assigns the current probability values to the instances in the database and compute the expectation value of the current mixture of distributions.Computed as the sum of the logarithms of the prior probability of each instance.
- Parameters:
cfs
- the cluster features to evaluatemodels
- Cluster modelsprobClusterIGivenX
- Output storage for cluster probabilities- Returns:
- the expectation value of the current mixture of distributions
-
assignProbabilitiesToInstances
public double assignProbabilitiesToInstances(Relation<? extends NumberVector> relation, java.util.List<? extends BetulaClusterModel> models, WritableDataStore<double[]> probClusterIGivenX)
Assigns the current probability values to the instances in the database and compute the expectation value of the current mixture of distributions.Computed as the sum of the logarithms of the prior probability of each instance.
- Parameters:
relation
- the database used for assignment to instancesmodels
- Cluster modelsprobClusterIGivenX
- Output storage for cluster probabilities- Returns:
- the expectation value of the current mixture of distributions
-
recomputeCovarianceMatrices
public void recomputeCovarianceMatrices(java.util.ArrayList<? extends ClusterFeature> cfs, java.util.Map<ClusterFeature,double[]> probClusterIGivenX, java.util.List<? extends BetulaClusterModel> models, double prior, int n)
Recompute the covariance matrixes.- Parameters:
cfs
- Cluster features to evaluateprobClusterIGivenX
- Object probabilitiesmodels
- Cluster models to updateprior
- MAP prior (use 0 for MLE)n
- data set size
-
-