Class FastCLARA<V>

  • Type Parameters:
    V - Data type
    All Implemented Interfaces:
    Algorithm, ClusteringAlgorithm<Clustering<MedoidModel>>, KMedoidsClustering<V>

    @Reference(authors="Erich Schubert, Peter J. Rousseeuw",
               title="Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms",
               booktitle="Proc. 12th Int. Conf. Similarity Search and Applications (SISAP\'2019)",
               url="https://doi.org/10.1007/978-3-030-32047-8_16",
               bibkey="DBLP:conf/sisap/SchubertR19")
    public class FastCLARA<V>
    extends FastPAM<V>
    Clustering Large Applications (CLARA) with the FastPAM improvements, to increase scalability in the number of clusters. This variant will also default to twice the sample size, to improve quality.

    TODO: use a triangular distance matrix, rather than a hash-map based cache, for a bit better performance and less memory.

    Reference:

    Erich Schubert, Peter J. Rousseeuw
    Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms
    Proc. 12th Int. Conf. Similarity Search and Applications (SISAP'2019)

    Since:
    0.7.0
    Author:
    Erich Schubert
    • Field Detail

      • LOG

        private static final Logging LOG
        Class logger.
      • sampling

        double sampling
        Sampling rate. If less than 1, it is considered to be a relative value.
      • numsamples

        int numsamples
        Number of samples to draw (i.e. iterations).
      • keepmed

        boolean keepmed
        Keep the previous medoids in the sample (see page 145).
    • Constructor Detail

      • FastCLARA

        public FastCLARA​(Distance<? super V> distance,
                         int k,
                         int maxiter,
                         KMedoidsInitialization<V> initializer,
                         double fasttol,
                         int numsamples,
                         double sampling,
                         boolean keepmed,
                         RandomFactory random)
        Constructor.
        Parameters:
        distance - Distance function to use
        k - Number of clusters to produce
        maxiter - Maximum number of iterations
        initializer - Initialization function
        numsamples - Number of samples (sampling iterations)
        sampling - Sampling rate (absolute or relative)
        keepmed - Keep the previous medoids in the next sample
        random - Random generator