V - vector type to analyzeM - model type to produce@Title(value="EM-Clustering: Clustering by Expectation Maximization") @Description(value="Cluster data via Gaussian mixture modeling and the EM algorithm") @Reference(authors="A. P. Dempster, N. M. Laird, D. B. Rubin",title="Maximum Likelihood from Incomplete Data via the EM algorithm",booktitle="Journal of the Royal Statistical Society, Series B, 39(1)",url="http://www.jstor.org/stable/2984875",bibkey="journals/jroyastatsocise2/DempsterLR77") @Reference(title="Bayesian Regularization for Normal Mixture Estimation and Model-Based Clustering",authors="C. Fraley, A. E. Raftery",booktitle="J. Classification 24(2)",url="https://doi.org/10.1007/s00357-007-0004-5",bibkey="DBLP:journals/classification/FraleyR07") @Alias(value="de.lmu.ifi.dbs.elki.algorithm.clustering.EM") @Priority(value=200) public class EM<V extends NumberVector,M extends MeanModel> extends AbstractAlgorithm<Clustering<M>> implements ClusteringAlgorithm<Clustering<M>>
Reference:
 A. P. Dempster, N. M. Laird, D. B. Rubin:
 Maximum Likelihood from Incomplete Data via the EM algorithm.
 Journal of the Royal Statistical Society, Series B, 39(1), 1977, pp. 1-31
 
The MAP estimation is derived from
 C. Fraley and A. E. Raftery
 Bayesian Regularization for Normal Mixture Estimation and Model-Based
 Clustering
 J. Classification 24(2)
| Modifier and Type | Class and Description | 
|---|---|
static class  | 
EM.Parameterizer<V extends NumberVector,M extends MeanModel>
Parameterization class. 
 | 
| Modifier and Type | Field and Description | 
|---|---|
private double | 
delta
Delta parameter 
 | 
private int | 
k
Number of clusters 
 | 
private static java.lang.String | 
KEY
Key for statistics logging. 
 | 
private static Logging | 
LOG
The logger for this class. 
 | 
private int | 
maxiter
Maximum number of iterations to allow 
 | 
private EMClusterModelFactory<V,M> | 
mfactory
Factory for producing the initial cluster model. 
 | 
private static double | 
MIN_LOGLIKELIHOOD
Minimum loglikelihood to avoid -infinity. 
 | 
private double | 
prior
Prior to enable MAP estimation (use 0 for MLE) 
 | 
private boolean | 
soft
Retain soft assignments. 
 | 
static SimpleTypeInformation<double[]> | 
SOFT_TYPE
Soft assignment result type. 
 | 
ALGORITHM_ID| Constructor and Description | 
|---|
EM(int k,
  double delta,
  EMClusterModelFactory<V,M> mfactory)
Constructor. 
 | 
EM(int k,
  double delta,
  EMClusterModelFactory<V,M> mfactory,
  int maxiter,
  boolean soft)
Constructor. 
 | 
EM(int k,
  double delta,
  EMClusterModelFactory<V,M> mfactory,
  int maxiter,
  double prior,
  boolean soft)
Constructor. 
 | 
| Modifier and Type | Method and Description | 
|---|---|
static double | 
assignProbabilitiesToInstances(Relation<? extends NumberVector> relation,
                              java.util.List<? extends EMClusterModel<?>> models,
                              WritableDataStore<double[]> probClusterIGivenX)
Assigns the current probability values to the instances in the database and
 compute the expectation value of the current mixture of distributions. 
 | 
TypeInformation[] | 
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query. 
 | 
protected Logging | 
getLogger()
Get the (STATIC) logger for this class. 
 | 
boolean | 
isSoft()  | 
private static double | 
logSumExp(double[] x)
Compute log(sum(exp(x_i)), with attention to numerical issues. 
 | 
static void | 
recomputeCovarianceMatrices(Relation<? extends NumberVector> relation,
                           WritableDataStore<double[]> probClusterIGivenX,
                           java.util.List<? extends EMClusterModel<?>> models,
                           double prior)
Recompute the covariance matrixes. 
 | 
Clustering<M> | 
run(Database database,
   Relation<V> relation)
Performs the EM clustering algorithm on the given database. 
 | 
void | 
setSoft(boolean soft)  | 
runclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitrunprivate static final Logging LOG
private static final java.lang.String KEY
private int k
private double delta
private EMClusterModelFactory<V extends NumberVector,M extends MeanModel> mfactory
private int maxiter
private double prior
private boolean soft
private static final double MIN_LOGLIKELIHOOD
public static final SimpleTypeInformation<double[]> SOFT_TYPE
public EM(int k,
          double delta,
          EMClusterModelFactory<V,M> mfactory)
k - k parameterdelta - delta parametermfactory - EM cluster model factorypublic EM(int k,
          double delta,
          EMClusterModelFactory<V,M> mfactory,
          int maxiter,
          boolean soft)
k - k parameterdelta - delta parametermfactory - EM cluster model factorymaxiter - Maximum number of iterationssoft - Include soft assignmentspublic EM(int k,
          double delta,
          EMClusterModelFactory<V,M> mfactory,
          int maxiter,
          double prior,
          boolean soft)
k - k parameterdelta - delta parametermfactory - EM cluster model factorymaxiter - Maximum number of iterationsprior - MAP priorsoft - Include soft assignmentspublic Clustering<M> run(Database database, Relation<V> relation)
database - Databaserelation - Relationpublic static void recomputeCovarianceMatrices(Relation<? extends NumberVector> relation, WritableDataStore<double[]> probClusterIGivenX, java.util.List<? extends EMClusterModel<?>> models, double prior)
relation - Vector dataprobClusterIGivenX - Object probabilitiesmodels - Cluster models to updateprior - MAP prior (use 0 for MLE)public static double assignProbabilitiesToInstances(Relation<? extends NumberVector> relation, java.util.List<? extends EMClusterModel<?>> models, WritableDataStore<double[]> probClusterIGivenX)
relation - the database used for assignment to instancesmodels - Cluster modelsprobClusterIGivenX - Output storage for cluster probabilitiesprivate static double logSumExp(double[] x)
x - Inputpublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithmgetInputTypeRestriction in interface AlgorithmgetInputTypeRestriction in class AbstractAlgorithm<Clustering<M extends MeanModel>>protected Logging getLogger()
AbstractAlgorithmgetLogger in class AbstractAlgorithm<Clustering<M extends MeanModel>>public boolean isSoft()
public void setSoft(boolean soft)
soft - the soft to setCopyright © 2019 ELKI Development Team. License information.