Package elki.algorithm.statistics
Class HopkinsStatisticClusteringTendency
- java.lang.Object
-
- elki.algorithm.statistics.HopkinsStatisticClusteringTendency
-
- All Implemented Interfaces:
Algorithm
@Reference(authors="B. Hopkins, J. G. Skellam", title="A new method for determining the type of distribution of plant individuals", booktitle="Annals of Botany, 18(2), 213-227", url="https://doi.org/10.1093/oxfordjournals.aob.a083391", bibkey="doi:10.1093/oxfordjournals.aob.a083391") public class HopkinsStatisticClusteringTendency extends java.lang.Object implements Algorithm
The Hopkins Statistic of Clustering Tendency measures the probability that a data set is generated by a uniform data distribution.The statistic compares the ratio of the 1NN distance for objects from the data set compared to the 1NN distances of uniform distributed objects.
Reference:
B. Hopkins, J. G. Skellam
A new method for determining the type of distribution of plant individuals
Annals of Botany, 18(2), 213-227.- Since:
- 0.7.0
- Author:
- Lisa Reichert, Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classHopkinsStatisticClusteringTendency.ParParameterization class.-
Nested classes/interfaces inherited from interface elki.Algorithm
Algorithm.Utils
-
-
Field Summary
Fields Modifier and Type Field Description protected NumberVectorDistance<? super NumberVector>distanceDistance function used.protected intkNearest neighbor to use.private static LoggingLOGThe logger for this class.private double[]maximaStores the maximum in each dimension.private double[]minimaStores the minimum in each dimension.protected RandomFactoryrandomRandom generator seeding.protected intrepNumber of repetitionsprotected intsampleSizeThe parameter sampleSizes
-
Constructor Summary
Constructors Constructor Description HopkinsStatisticClusteringTendency(NumberVectorDistance<? super NumberVector> distance, int samplesize, RandomFactory random, int rep, int k, double[] minima, double[] maxima)Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected doublecomputeNNForRealData(KNNSearcher<DBIDRef> knnQuery, Relation<NumberVector> relation, int dim)Search nearest neighbors for real data members.protected doublecomputeNNForUniformData(KNNSearcher<NumberVector> knnQuery, double[] min, double[] extend)Search nearest neighbors for artificial, uniform data.TypeInformation[]getInputTypeRestriction()Get the input type restriction used for negotiating the data query.protected voidinitializeDataExtends(Relation<NumberVector> relation, int dim, double[] min, double[] extend)Initialize the uniform sampling area.java.lang.Doublerun(Relation<NumberVector> relation)Compute the Hopkins statistic for a vector relation.
-
-
-
Field Detail
-
LOG
private static final Logging LOG
The logger for this class.
-
sampleSize
protected int sampleSize
The parameter sampleSizes
-
rep
protected int rep
Number of repetitions
-
k
protected int k
Nearest neighbor to use.
-
random
protected RandomFactory random
Random generator seeding.
-
maxima
private double[] maxima
Stores the maximum in each dimension.
-
minima
private double[] minima
Stores the minimum in each dimension.
-
distance
protected NumberVectorDistance<? super NumberVector> distance
Distance function used.
-
-
Constructor Detail
-
HopkinsStatisticClusteringTendency
public HopkinsStatisticClusteringTendency(NumberVectorDistance<? super NumberVector> distance, int samplesize, RandomFactory random, int rep, int k, double[] minima, double[] maxima)
Constructor.- Parameters:
distance- Distance functionsamplesize- Sample sizerandom- Random generatorrep- Number of repetitionsk- Nearest neighbors to useminima- Data space minima, may benull(get from data).maxima- Data space minima, may benull(get from data).
-
-
Method Detail
-
getInputTypeRestriction
public TypeInformation[] getInputTypeRestriction()
Description copied from interface:AlgorithmGet the input type restriction used for negotiating the data query.- Specified by:
getInputTypeRestrictionin interfaceAlgorithm- Returns:
- Type restriction
-
run
public java.lang.Double run(Relation<NumberVector> relation)
Compute the Hopkins statistic for a vector relation.- Parameters:
relation- Relation- Returns:
- Hopkins statistic
-
computeNNForRealData
protected double computeNNForRealData(KNNSearcher<DBIDRef> knnQuery, Relation<NumberVector> relation, int dim)
Search nearest neighbors for real data members.- Parameters:
knnQuery- KNN queryrelation- Data relation- Returns:
- Aggregated 1NN distances
-
computeNNForUniformData
protected double computeNNForUniformData(KNNSearcher<NumberVector> knnQuery, double[] min, double[] extend)
Search nearest neighbors for artificial, uniform data.- Parameters:
knnQuery- KNN querymin- Data minimaextend- Data extend- Returns:
- Aggregated 1NN distances
-
initializeDataExtends
protected void initializeDataExtends(Relation<NumberVector> relation, int dim, double[] min, double[] extend)
Initialize the uniform sampling area.- Parameters:
relation- Data relationdim- Dimensionalitymin- Minima output array (preallocated!)extend- Data extend output array (preallocated!)
-
-