Package elki.algorithm.statistics
Class HopkinsStatisticClusteringTendency
- java.lang.Object
-
- elki.algorithm.statistics.HopkinsStatisticClusteringTendency
-
- All Implemented Interfaces:
Algorithm
@Reference(authors="B. Hopkins, J. G. Skellam", title="A new method for determining the type of distribution of plant individuals", booktitle="Annals of Botany, 18(2), 213-227", url="https://doi.org/10.1093/oxfordjournals.aob.a083391", bibkey="doi:10.1093/oxfordjournals.aob.a083391") public class HopkinsStatisticClusteringTendency extends java.lang.Object implements Algorithm
The Hopkins Statistic of Clustering Tendency measures the probability that a data set is generated by a uniform data distribution.The statistic compares the ratio of the 1NN distance for objects from the data set compared to the 1NN distances of uniform distributed objects.
Reference:
B. Hopkins, J. G. Skellam
A new method for determining the type of distribution of plant individuals
Annals of Botany, 18(2), 213-227.- Since:
- 0.7.0
- Author:
- Lisa Reichert, Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
HopkinsStatisticClusteringTendency.Par
Parameterization class.-
Nested classes/interfaces inherited from interface elki.Algorithm
Algorithm.Utils
-
-
Field Summary
Fields Modifier and Type Field Description protected NumberVectorDistance<? super NumberVector>
distance
Distance function used.protected int
k
Nearest neighbor to use.private static Logging
LOG
The logger for this class.private double[]
maxima
Stores the maximum in each dimension.private double[]
minima
Stores the minimum in each dimension.protected RandomFactory
random
Random generator seeding.protected int
rep
Number of repetitionsprotected int
sampleSize
The parameter sampleSizes
-
Constructor Summary
Constructors Constructor Description HopkinsStatisticClusteringTendency(NumberVectorDistance<? super NumberVector> distance, int samplesize, RandomFactory random, int rep, int k, double[] minima, double[] maxima)
Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected double
computeNNForRealData(KNNSearcher<DBIDRef> knnQuery, Relation<NumberVector> relation, int dim)
Search nearest neighbors for real data members.protected double
computeNNForUniformData(KNNSearcher<NumberVector> knnQuery, double[] min, double[] extend)
Search nearest neighbors for artificial, uniform data.TypeInformation[]
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.protected void
initializeDataExtends(Relation<NumberVector> relation, int dim, double[] min, double[] extend)
Initialize the uniform sampling area.java.lang.Double
run(Relation<NumberVector> relation)
Compute the Hopkins statistic for a vector relation.
-
-
-
Field Detail
-
LOG
private static final Logging LOG
The logger for this class.
-
sampleSize
protected int sampleSize
The parameter sampleSizes
-
rep
protected int rep
Number of repetitions
-
k
protected int k
Nearest neighbor to use.
-
random
protected RandomFactory random
Random generator seeding.
-
maxima
private double[] maxima
Stores the maximum in each dimension.
-
minima
private double[] minima
Stores the minimum in each dimension.
-
distance
protected NumberVectorDistance<? super NumberVector> distance
Distance function used.
-
-
Constructor Detail
-
HopkinsStatisticClusteringTendency
public HopkinsStatisticClusteringTendency(NumberVectorDistance<? super NumberVector> distance, int samplesize, RandomFactory random, int rep, int k, double[] minima, double[] maxima)
Constructor.- Parameters:
distance
- Distance functionsamplesize
- Sample sizerandom
- Random generatorrep
- Number of repetitionsk
- Nearest neighbors to useminima
- Data space minima, may benull
(get from data).maxima
- Data space minima, may benull
(get from data).
-
-
Method Detail
-
getInputTypeRestriction
public TypeInformation[] getInputTypeRestriction()
Description copied from interface:Algorithm
Get the input type restriction used for negotiating the data query.- Specified by:
getInputTypeRestriction
in interfaceAlgorithm
- Returns:
- Type restriction
-
run
public java.lang.Double run(Relation<NumberVector> relation)
Compute the Hopkins statistic for a vector relation.- Parameters:
relation
- Relation- Returns:
- Hopkins statistic
-
computeNNForRealData
protected double computeNNForRealData(KNNSearcher<DBIDRef> knnQuery, Relation<NumberVector> relation, int dim)
Search nearest neighbors for real data members.- Parameters:
knnQuery
- KNN queryrelation
- Data relation- Returns:
- Aggregated 1NN distances
-
computeNNForUniformData
protected double computeNNForUniformData(KNNSearcher<NumberVector> knnQuery, double[] min, double[] extend)
Search nearest neighbors for artificial, uniform data.- Parameters:
knnQuery
- KNN querymin
- Data minimaextend
- Data extend- Returns:
- Aggregated 1NN distances
-
initializeDataExtends
protected void initializeDataExtends(Relation<NumberVector> relation, int dim, double[] min, double[] extend)
Initialize the uniform sampling area.- Parameters:
relation
- Data relationdim
- Dimensionalitymin
- Minima output array (preallocated!)extend
- Data extend output array (preallocated!)
-
-