Class CLARANS<O>

  • Type Parameters:
    O - Input data type
    All Implemented Interfaces:
    Algorithm, ClusteringAlgorithm<Clustering<MedoidModel>>, KMedoidsClustering<O>
    Direct Known Subclasses:
    FastCLARANS

    @Reference(authors="R. T. Ng, J. Han",
               title="CLARANS: a method for clustering objects for spatial data mining",
               booktitle="IEEE Transactions on Knowledge and Data Engineering 14(5)",
               url="https://doi.org/10.1109/TKDE.2002.1033770",
               bibkey="DBLP:journals/tkde/NgH02")
    public class CLARANS<O>
    extends java.lang.Object
    implements KMedoidsClustering<O>
    CLARANS: a method for clustering objects for spatial data mining is inspired by PAM (partitioning around medoids, PAM) and CLARA and also based on sampling.

    This implementation tries to balance memory and computation time. By caching the distances to the two nearest medoids, we usually only need O(n) instead of O(nk) distance computations for one iteration, at the cost of needing O(2n) memory to store them.

    The implementation is fairly ugly, because we have three solutions (the best found so far, the current solution, and a neighbor candidate); and for each point in each solution we need the best and second best assignments. But with Java 11, we may be able to switch to value types that would clean this code significantly, without the overhead of O(n) objects.

    Reference:

    R. T. Ng, J. Han
    CLARANS: a method for clustering objects for spatial data mining
    IEEE Transactions on Knowledge and Data Engineering 14(5)

    Since:
    0.7.5
    Author:
    Erich Schubert
    • Field Detail

      • LOG

        private static final Logging LOG
        Class logger.
      • distance

        protected Distance<? super O> distance
        Distance function used.
      • k

        protected int k
        Number of clusters to find.
      • numlocal

        protected int numlocal
        Number of samples to draw (i.e. restarts).
      • maxneighbor

        protected double maxneighbor
        Sampling rate. If less than 1, it is considered to be a relative value.
      • random

        protected RandomFactory random
        Random factory for initialization.
    • Constructor Detail

      • CLARANS

        public CLARANS​(Distance<? super O> distance,
                       int k,
                       int numlocal,
                       double maxneighbor,
                       RandomFactory random)
        Constructor.
        Parameters:
        distance - Distance function to use
        k - Number of clusters to produce
        numlocal - Number of samples (restarts)
        maxneighbor - Neighbor sampling rate (absolute or relative)
        random - Random generator