@Reference(authors="Erich Schubert, Michael Gertz", title="Semantic Word Clouds with Background Corpus Normalization and t-distributed Stochastic Neighbor Embedding", booktitle="ArXiV preprint, 1708.03569", url="http://arxiv.org/abs/1708.03569", bibkey="DBLP:journals/corr/abs-1708-03569") @Priority(value=206) public class ClustersWithNoiseExtraction extends java.lang.Object implements ClusteringAlgorithm<Clustering<Model>>
This will execute the highest-most cut where we retain k clusters, each with a minimum size, plus noise (single points that would only merge afterwards). If no such cut can be found, it returns a result with a relaxed k.
You need to specify: A) the minimum size of a cluster (it does not make much sense to use 1 - then it will simply execute all but the last k merges) and B) the desired number of clusters with at least minSize elements each.
Reference:
 Erich Schubert, Michael Gertz
 Semantic Word Clouds with Background Corpus Normalization and t-distributed
 Stochastic Neighbor Embedding
 ArXiV preprint, 1708.03569
 
TODO: Also provide representatives and last merge height for clusters.
| Modifier and Type | Class and Description | 
|---|---|
protected class  | 
ClustersWithNoiseExtraction.Instance
Instance for a single data set. 
 | 
static class  | 
ClustersWithNoiseExtraction.Parameterizer
Parameterization class. 
 | 
| Modifier and Type | Field and Description | 
|---|---|
private HierarchicalClusteringAlgorithm | 
algorithm
Clustering algorithm to run to obtain the hierarchy. 
 | 
private static Logging | 
LOG
Class logger. 
 | 
private int | 
minClSize
Minimum cluster size. 
 | 
private int | 
numCl
Minimum number of clusters. 
 | 
| Constructor and Description | 
|---|
ClustersWithNoiseExtraction(HierarchicalClusteringAlgorithm algorithm,
                           int numCl,
                           int minClSize)
Constructor. 
 | 
| Modifier and Type | Method and Description | 
|---|---|
TypeInformation[] | 
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query. 
 | 
Clustering<Model> | 
run(Database database)
Runs the algorithm. 
 | 
Clustering<Model> | 
run(PointerHierarchyRepresentationResult pointerresult)
Process an existing result. 
 | 
private static final Logging LOG
private int numCl
private int minClSize
private HierarchicalClusteringAlgorithm algorithm
public ClustersWithNoiseExtraction(HierarchicalClusteringAlgorithm algorithm, int numCl, int minClSize)
algorithm - Algorithm to runnumCl - Number of clustersminClSize - Minimum cluster sizepublic Clustering<Model> run(Database database)
Algorithmrun in interface Algorithmrun in interface ClusteringAlgorithm<Clustering<Model>>database - the database to run the algorithm onpublic Clustering<Model> run(PointerHierarchyRepresentationResult pointerresult)
pointerresult - Existing result in pointer representation.public TypeInformation[] getInputTypeRestriction()
AlgorithmgetInputTypeRestriction in interface AlgorithmCopyright © 2019 ELKI Development Team. License information.