Class HopkinsStatisticClusteringTendency

  • All Implemented Interfaces:
    Algorithm

    @Reference(authors="B. Hopkins, J. G. Skellam",
               title="A new method for determining the type of distribution of plant individuals",
               booktitle="Annals of Botany, 18(2), 213-227",
               url="https://doi.org/10.1093/oxfordjournals.aob.a083391",
               bibkey="doi:10.1093/oxfordjournals.aob.a083391")
    public class HopkinsStatisticClusteringTendency
    extends java.lang.Object
    implements Algorithm
    The Hopkins Statistic of Clustering Tendency measures the probability that a data set is generated by a uniform data distribution.

    The statistic compares the ratio of the 1NN distance for objects from the data set compared to the 1NN distances of uniform distributed objects.

    Reference:

    B. Hopkins, J. G. Skellam
    A new method for determining the type of distribution of plant individuals
    Annals of Botany, 18(2), 213-227.

    Since:
    0.7.0
    Author:
    Lisa Reichert, Erich Schubert
    • Field Detail

      • LOG

        private static final Logging LOG
        The logger for this class.
      • sampleSize

        protected int sampleSize
        The parameter sampleSizes
      • rep

        protected int rep
        Number of repetitions
      • k

        protected int k
        Nearest neighbor to use.
      • random

        protected RandomFactory random
        Random generator seeding.
      • maxima

        private double[] maxima
        Stores the maximum in each dimension.
      • minima

        private double[] minima
        Stores the minimum in each dimension.
    • Constructor Detail

      • HopkinsStatisticClusteringTendency

        public HopkinsStatisticClusteringTendency​(NumberVectorDistance<? super NumberVector> distance,
                                                  int samplesize,
                                                  RandomFactory random,
                                                  int rep,
                                                  int k,
                                                  double[] minima,
                                                  double[] maxima)
        Constructor.
        Parameters:
        distance - Distance function
        samplesize - Sample size
        random - Random generator
        rep - Number of repetitions
        k - Nearest neighbors to use
        minima - Data space minima, may be null (get from data).
        maxima - Data space minima, may be null (get from data).
    • Method Detail

      • getInputTypeRestriction

        public TypeInformation[] getInputTypeRestriction()
        Description copied from interface: Algorithm
        Get the input type restriction used for negotiating the data query.
        Specified by:
        getInputTypeRestriction in interface Algorithm
        Returns:
        Type restriction
      • run

        public java.lang.Double run​(Relation<NumberVector> relation)
        Compute the Hopkins statistic for a vector relation.
        Parameters:
        relation - Relation
        Returns:
        Hopkins statistic
      • computeNNForRealData

        protected double computeNNForRealData​(KNNSearcher<DBIDRef> knnQuery,
                                              Relation<NumberVector> relation,
                                              int dim)
        Search nearest neighbors for real data members.
        Parameters:
        knnQuery - KNN query
        relation - Data relation
        Returns:
        Aggregated 1NN distances
      • computeNNForUniformData

        protected double computeNNForUniformData​(KNNSearcher<NumberVector> knnQuery,
                                                 double[] min,
                                                 double[] extend)
        Search nearest neighbors for artificial, uniform data.
        Parameters:
        knnQuery - KNN query
        min - Data minima
        extend - Data extend
        Returns:
        Aggregated 1NN distances
      • initializeDataExtends

        protected void initializeDataExtends​(Relation<NumberVector> relation,
                                             int dim,
                                             double[] min,
                                             double[] extend)
        Initialize the uniform sampling area.
        Parameters:
        relation - Data relation
        dim - Dimensionality
        min - Minima output array (preallocated!)
        extend - Data extend output array (preallocated!)