Class FasterCLARA<O>

  • Type Parameters:
    O - Data type
    All Implemented Interfaces:
    Algorithm, ClusteringAlgorithm<Clustering<MedoidModel>>, KMedoidsClustering<O>

    @Reference(authors="Erich Schubert and Peter J. Rousseeuw",
               title="Fast and Eager k-Medoids Clustering: O(k) Runtime Improvement of the PAM, CLARA, and CLARANS Algorithms",
               booktitle="arXiv preprint",
               url="https://arxiv.org/abs/2008.05171",
               bibkey="DBLP:journals/corr/abs-2008-05171")
    public class FasterCLARA<O>
    extends FasterPAM<O>
    Clustering Large Applications (CLARA) with the FastPAM improvements, to increase scalability in the number of clusters. This variant will also default to twice the sample size, to improve quality.

    TODO: use a triangular distance matrix, rather than a hash-map based cache, for a bit better performance and less memory.

    Reference:

    Erich Schubert and Peter J. Rousseeuw
    Fast and Eager k-Medoids Clustering: O(k) Runtime Improvement of the PAM, CLARA, and CLARANS Algorithms
    Preprint

    Since:
    0.7.0
    Author:
    Erich Schubert
    • Field Detail

      • LOG

        private static final Logging LOG
        Class logger.
      • sampling

        double sampling
        Sampling rate. If less than 1, it is considered to be a relative value.
      • numsamples

        int numsamples
        Number of samples to draw (i.e. iterations).
      • keepmed

        boolean keepmed
        Keep the previous medoids in the sample (see page 145).
    • Constructor Detail

      • FasterCLARA

        public FasterCLARA​(Distance<? super O> distance,
                           int k,
                           int maxiter,
                           KMedoidsInitialization<O> initializer,
                           int numsamples,
                           double sampling,
                           boolean keepmed,
                           RandomFactory random)
        Constructor.
        Parameters:
        distance - Distance function to use
        k - Number of clusters to produce
        maxiter - Maximum number of iterations
        initializer - Initialization function
        numsamples - Number of samples (sampling iterations)
        sampling - Sampling rate (absolute or relative)
        keepmed - Keep the previous medoids in the next sample
        random - Random generator