Class FasterMSC<O>

  • Type Parameters:
    O -
    All Implemented Interfaces:
    Algorithm, ClusteringAlgorithm<Clustering<MedoidModel>>, KMedoidsClustering<O>

    @Reference(authors="Lars Lenssen and Erich Schubert",
               title="Clustering by Direct Optimization of the Medoid Silhouette",
               booktitle="Int. Conf. on Similarity Search and Applications, SISAP 2022",
               url="https://doi.org/10.1007/978-3-031-17849-8_15",
               bibkey="DBLP:conf/sisap/LenssenS22")
    @Priority(200)
    public class FasterMSC<O>
    extends FastMSC<O>
    Fast and Eager Medoid Silhouette Clustering.

    This clustering algorithm tries to find an optimal silhouette clustering for an approximation to the silhouette called "medoid silhouette" using a swap-based heuristic similar to PAM. By also caching the distance to the third nearest center (compare to FastPAM, which only used the second nearest), we are able to reduce the runtime per iteration to just O(n²), which yields an acceptable run time for many use cases, while often finding a solution with better silhouette than other clustering methods. This version also performs eager swapping instead of a steepest descent, i.e., it performs any swap that improves the medoid silhouette immediately, and hence may need fewer iterations.

    Reference:

    Lars Lenssen and Erich Schubert
    Clustering by Direct Optimization of the Medoid Silhouette
    Int. Conf. on Similarity Search and Applications, SISAP 2022

    Author:
    Erich Schubert
    • Field Detail

      • LOG

        private static final Logging LOG
        The logger for this class.
    • Constructor Detail

      • FasterMSC

        public FasterMSC​(Distance<? super O> distance,
                         int k,
                         int maxiter,
                         KMedoidsInitialization<O> initializer)
        Constructor.
        Parameters:
        distance - Distance function
        k - Number of cluster
        maxiter - Maximum number of iterations
        initializer - Initialization
    • Method Detail

      • run

        public Clustering<MedoidModel> run​(Relation<O> relation,
                                           int k,
                                           DistanceQuery<? super O> distQ)
        Description copied from interface: KMedoidsClustering
        Run k-medoids clustering with a given distance query.
        Not a very elegant API, but needed for some types of nested k-medoids.
        Specified by:
        run in interface KMedoidsClustering<O>
        Overrides:
        run in class FastMSC<O>
        Parameters:
        relation - relation to use
        k - Number of clusters
        distQ - Distance query to use
        Returns:
        result
      • getLogger

        protected Logging getLogger()
        Description copied from class: PAM
        Get the static class logger.
        Overrides:
        getLogger in class FastMSC<O>