Class AlternatingKMedoids<O>

  • Type Parameters:
    O - vector datatype
    All Implemented Interfaces:
    Algorithm, ClusteringAlgorithm<Clustering<MedoidModel>>, KMedoidsClustering<O>

    @Priority(-100)
    @Reference(authors="F. E. Maranzana",title="On the location of supply points to minimize transport costs",booktitle="Journal of the Operational Research Society 15.3",url="https://doi.org/10.1057/jors.1964.47",bibkey="doi:10.1057/jors.1964.47") @Reference(authors="A. P. Reynolds, G. Richards, B. de la Iglesia, V. J. Rayward-Smith",title="Clustering Rules: A Comparison of Partitioning and Hierarchical Clustering Algorithms",booktitle="J. Math. Model. Algorithms 5(4)",url="https://doi.org/10.1007/s10852-005-9022-1",bibkey="DBLP:journals/jmma/ReynoldsRIR06") @Reference(authors="H.-S. Park, C.-H. Jun",title="A simple and fast algorithm for K-medoids clustering",booktitle="Expert Systems with Applications 36(2)",url="https://doi.org/10.1016/j.eswa.2008.01.039",bibkey="DBLP:journals/eswa/ParkJ09")
    public class AlternatingKMedoids<O>
    extends java.lang.Object
    implements KMedoidsClustering<O>
    A k-medoids clustering algorithm, implemented as EM-style batch algorithm; known in literature as the "alternate" method.

    In contrast to PAM, which will in each iteration update one medoid with one (arbitrary) non-medoid, this implementation follows the EM pattern. In the expectation step, the best medoid from the cluster members is chosen; in the M-step, the objects are reassigned to their nearest medoid.

    This implementation evolved naturally from EM and k-means algorithms. We then came across a similar approach was published by Park and Jun, and made this code match their suggestions (in particular, the default initialization is as proposed by them, despite its shortcomings). But similar ideas were already discussed by Reynolds et al. as a side note, and can be further traced back to Maranzana.

    In our experiments, it tends to be much faster than PAM, but also find less good solutions, as the medoids are only chosen from the cluster members. This aligns with findings of Reynolds et al. and can be explained with the requirement of the new medoid to cover the entire cluster. Similar observations were also made by Teitz and Bart, 1967, who described its performance as "erratic".

    Reference:

    F. E. Maranzana
    On the location of supply points to minimize transport costs
    Journal of the Operational Research Society 15.3

    A. P. Reynolds, G. Richards, B. de la Iglesia, V. J. Rayward-Smith
    Clustering Rules: A Comparison of Partitioning and Hierarchical Clustering Algorithms
    J. Math. Model. Algorithms 5(4)

    H.-S. Park, C.-H. Jun
    A simple and fast algorithm for K-medoids clustering
    Expert Systems with Applications 36(2)

    Since:
    0.5.0
    Author:
    Erich Schubert
    • Field Detail

      • LOG

        private static final Logging LOG
        The logger for this class.
      • KEY

        private static final java.lang.String KEY
        Key for statistics logging.
      • distance

        protected Distance<? super O> distance
        Distance function used.
      • k

        protected int k
        Number of clusters to find.
      • maxiter

        protected int maxiter
        Maximum number of iterations.
    • Constructor Detail

      • AlternatingKMedoids

        public AlternatingKMedoids​(Distance<? super O> distance,
                                   int k,
                                   int maxiter,
                                   KMedoidsInitialization<O> initializer)
        Constructor.
        Parameters:
        distance - distance function
        k - k parameter
        maxiter - Maxiter parameter
        initializer - Function to generate the initial means