Class KNNDistancesSampler<O>

  • Type Parameters:
    O - the type of objects handled by this algorithm
    All Implemented Interfaces:
    Algorithm

    @Title("KNN-Distance-Order")
    @Description("Assesses the knn distances for a specified k and orders them.")
    @Reference(authors="Martin Ester, Hans-Peter Kriegel, J\u00f6rg Sander, Xiaowei Xu",title="A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise",booktitle="Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD \'96)",url="http://www.aaai.org/Library/KDD/1996/kdd96-037.php",bibkey="DBLP:conf/kdd/EsterKSX96") @Reference(authors="Erich Schubert, J\u00f6rg Sander, Martin Ester, Hans-Peter Kriegel, Xiaowei Xu",title="DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN",booktitle="ACM Trans. Database Systems (TODS)",url="https://doi.org/10.1145/3068335",bibkey="DBLP:journals/tods/SchubertSEKX17")
    public class KNNDistancesSampler<O>
    extends java.lang.Object
    implements Algorithm
    Provides an order of the kNN-distances for all objects within the database.

    This class can be used to estimate parameters for other algorithms, such as estimating the epsilon parameter for DBSCAN: set k to minPts-1, and then choose a percentile from the sample as epsilon, or plot the result as a graph and look for a bend or knee in this plot.

    Reference:

    Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu
    A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise
    Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD '96)

    Further discussion:

    Erich Schubert, Jörg Sander, Martin Ester, Hans-Peter Kriegel, Xiaowei Xu
    DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN
    ACM Trans. Database Systems (TODS)

    Since:
    0.1
    Author:
    Arthur Zimek
    • Field Detail

      • LOG

        private static final Logging LOG
        The logger for this class.
      • distance

        protected Distance<? super O> distance
        Distance function used.
      • k

        protected int k
        Parameter k.
      • sample

        protected double sample
        Sampling percentage.
    • Constructor Detail

      • KNNDistancesSampler

        public KNNDistancesSampler​(Distance<? super O> distance,
                                   int k,
                                   double sample,
                                   RandomFactory rnd)
        Constructor.
        Parameters:
        distance - Distance function
        k - k Parameter
        sample - Sampling rate, or sample size (when > 1)
        rnd - Random source.