Class PAMSIL<O>

  • Type Parameters:
    O - Input data type
    All Implemented Interfaces:
    Algorithm, ClusteringAlgorithm<Clustering<MedoidModel>>, KMedoidsClustering<O>
    Direct Known Subclasses:
    PAMMEDSIL

    @Reference(authors="M. Van der Laan, K. Pollard, J. Bryan",title="A new partitioning around medoids algorithm",booktitle="Journal of Statistical Computation and Simulation 73(8)",url="https://doi.org/10.1080/0094965031000136012",bibkey="doi:10.1080/0094965031000136012") @Reference(authors="Lars Lenssen and Erich Schubert",title="Clustering by Direct Optimization of the Medoid Silhouette",booktitle="Int. Conf. on Similarity Search and Applications, SISAP 2022",url="https://doi.org/10.1007/978-3-031-17849-8_15",bibkey="DBLP:conf/sisap/LenssenS22")
    public class PAMSIL<O>
    extends PAM<O>
    Clustering to optimize the Silhouette coefficient with a PAM-based swap heuristic. This is the baseline algorithm, but it is fairly expensive: each iteration it considers (n-k)k swaps, for each the Silhouette needs to be evaluated at n² cost. The overall complexity per iteration hence is O((n-k)k ((n-k)+n²)), i.e., O(n³) for small k, making this a quite expensive clustering method.

    Reference:

    M. Van der Laan, K. Pollard, J. Bryan
    A new partitioning around medoids algorithm
    Journal of Statistical Computation and Simulation 73(8)

    This already incorporates some optimizations from:

    Lars Lenssen and Erich Schubert
    Clustering by Direct Optimization of the Medoid Silhouette
    Int. Conf. on Similarity Search and Applications, SISAP 2022

    Since:
    0.8.0
    Author:
    Erich Schubert
    • Field Detail

      • LOG

        private static final Logging LOG
        The logger for this class.
    • Constructor Detail

      • PAMSIL

        public PAMSIL​(Distance<? super O> distance,
                      int k,
                      int maxiter,
                      KMedoidsInitialization<O> initializer)
        Constructor.
        Parameters:
        distance - distance function
        k - k parameter
        maxiter - Maxiter parameter
        initializer - Function to generate the initial means
    • Method Detail

      • run

        public Clustering<MedoidModel> run​(Relation<O> relation,
                                           int k,
                                           DistanceQuery<? super O> distQ)
        Description copied from interface: KMedoidsClustering
        Run k-medoids clustering with a given distance query.
        Not a very elegant API, but needed for some types of nested k-medoids.
        Specified by:
        run in interface KMedoidsClustering<O>
        Overrides:
        run in class PAM<O>
        Parameters:
        relation - relation to use
        k - Number of clusters
        distQ - Distance query to use
        Returns:
        result
      • getLogger

        protected Logging getLogger()
        Description copied from class: PAM
        Get the static class logger.
        Overrides:
        getLogger in class PAM<O>