Class ClustersWithNoiseExtraction

  • All Implemented Interfaces:
    Algorithm, ClusteringAlgorithm<Clustering<Model>>

    @Reference(authors="Erich Schubert, Michael Gertz",
               title="Semantic Word Clouds with Background Corpus Normalization and t-distributed Stochastic Neighbor Embedding",
               booktitle="ArXiV preprint, 1708.03569",
               url="http://arxiv.org/abs/1708.03569",
               bibkey="DBLP:journals/corr/abs-1708-03569")
    @Priority(206)
    public class ClustersWithNoiseExtraction
    extends java.lang.Object
    implements ClusteringAlgorithm<Clustering<Model>>
    Extraction of a given number of clusters with a minimum size, and noise.

    This will execute the highest-most cut where we retain k clusters, each with a minimum size, plus noise (single points that would only merge afterwards). If no such cut can be found, it returns a result with a relaxed k.

    You need to specify: A) the minimum size of a cluster (it does not make much sense to use 1 - then it will simply execute all but the last k merges) and B) the desired number of clusters with at least minSize elements each.

    Reference:

    Erich Schubert, Michael Gertz
    Semantic Word Clouds with Background Corpus Normalization and t-distributed Stochastic Neighbor Embedding
    ArXiV preprint, 1708.03569

    TODO: Also provide representatives and last merge height for clusters.

    Since:
    0.7.5
    Author:
    Erich Schubert
    • Field Detail

      • LOG

        private static final Logging LOG
        Class logger.
      • numCl

        private int numCl
        Minimum number of clusters.
      • minClSize

        private int minClSize
        Minimum cluster size.
    • Constructor Detail

      • ClustersWithNoiseExtraction

        public ClustersWithNoiseExtraction​(HierarchicalClusteringAlgorithm algorithm,
                                           int numCl,
                                           int minClSize)
        Constructor.
        Parameters:
        algorithm - Algorithm to run
        numCl - Number of clusters
        minClSize - Minimum cluster size