Class FastMSC<O>

  • Type Parameters:
    O -
    All Implemented Interfaces:
    Algorithm, ClusteringAlgorithm<Clustering<MedoidModel>>, KMedoidsClustering<O>
    Direct Known Subclasses:

    @Reference(authors="Lars Lenssen and Erich Schubert",
               title="Clustering by Direct Optimization of the Medoid Silhouette",
               booktitle="Int. Conf. on Similarity Search and Applications, SISAP 2022",
    public class FastMSC<O>
    extends PAMMEDSIL<O>
    Fast Medoid Silhouette Clustering.

    This clustering algorithm tries to find an optimal silhouette clustering for an approximation to the silhouette called "medoid silhouette" using a swap-based heuristic similar to PAM. By also caching the distance to the third nearest center (compare to FastPAM, which only used the second nearest), we are able to reduce the runtime per iteration to just O(n²), which yields an acceptable run time for many use cases, while often finding a solution with better silhouette than other clustering methods.


    Lars Lenssen and Erich Schubert
    Clustering by Direct Optimization of the Medoid Silhouette
    Int. Conf. on Similarity Search and Applications, SISAP 2022

    Erich Schubert
    • Field Detail

      • LOG

        private static final Logging LOG
        The logger for this class.
    • Constructor Detail

      • FastMSC

        public FastMSC​(Distance<? super O> distance,
                       int k,
                       int maxiter,
                       KMedoidsInitialization<O> initializer)
        distance - Distance function
        k - Number of cluster
        maxiter - Maximum number of iterations
        initializer - Initialization
    • Method Detail

      • run

        public Clustering<MedoidModel> run​(Relation<O> relation,
                                           int k,
                                           DistanceQuery<? super O> distQ)
        Description copied from interface: KMedoidsClustering
        Run k-medoids clustering with a given distance query.
        Not a very elegant API, but needed for some types of nested k-medoids.
        Specified by:
        run in interface KMedoidsClustering<O>
        run in class PAMMEDSIL<O>
        relation - relation to use
        k - Number of clusters
        distQ - Distance query to use
      • loss

        protected static final double loss​(double a,
                                           double b)
        Loss function used - here simply a/b, 0 if a=b=0.
        a - distance to nearest
        b - distance to second
        loss, a/b or 0.
      • getLogger

        protected Logging getLogger()
        Description copied from class: PAM
        Get the static class logger.
        getLogger in class PAMMEDSIL<O>