Class HellingerDistance
- java.lang.Object
-
- elki.distance.AbstractNumberVectorDistance
-
- elki.distance.probabilistic.HellingerDistance
-
- All Implemented Interfaces:
Distance<NumberVector>
,NumberVectorDistance<NumberVector>
,PrimitiveDistance<NumberVector>
,SpatialPrimitiveDistance<NumberVector>
,NormalizedPrimitiveSimilarity<NumberVector>
,NormalizedSimilarity<NumberVector>
,PrimitiveSimilarity<NumberVector>
,Similarity<NumberVector>
@Reference(authors="E. Hellinger",title="Neue Begr\u00fcndung der Theorie quadratischer Formen von unendlichvielen Ver\u00e4nderlichen",booktitle="Journal f\u00fcr die reine und angewandte Mathematik",url="http://resolver.sub.uni-goettingen.de/purl?GDZPPN002166941",bibkey="journals/mathematik/Hellinger1909") @Reference(authors="M.-M. Deza, E. Deza",title="Dictionary of distances",booktitle="Dictionary of distances",url="https://doi.org/10.1007/978-3-642-00234-2",bibkey="doi:10.1007/978-3-642-00234-2") @Alias({"hellinger","bhattacharyya"}) public class HellingerDistance extends AbstractNumberVectorDistance implements SpatialPrimitiveDistance<NumberVector>, NormalizedPrimitiveSimilarity<NumberVector>
Hellinger metric / affinity / kernel, Bhattacharyya coefficient, fidelity similarity, Matusita distance, Hellinger-Kakutani metric on a probability distribution.We assume vectors represent normalized probability distributions. Then \[\text{Hellinger}(\vec{x},\vec{y}):= \sqrt{\tfrac12\sum\nolimits_i \left(\sqrt{x_i}-\sqrt{y_i}\right)^2 } \]
The corresponding kernel / similarity is \[ K_{\text{Hellinger}}(\vec{x},\vec{y}) := \sum\nolimits_i \sqrt{x_i y_i} \]
If we have normalized probability distributions, we have the nice property that \( K_{\text{Hellinger}}(\vec{x},\vec{x}) = \sum\nolimits_i x_i = 1\). and therefore \( K_{\text{Hellinger}}(\vec{x},\vec{y}) \in [0:1] \).
Furthermore, we have the following relationship between this variant of the distance and this kernel: \[ \text{Hellinger}^2(\vec{x},\vec{y}) = \tfrac12\sum\nolimits_i \left(\sqrt{x_i}-\sqrt{y_i}\right)^2 = \tfrac12\sum\nolimits_i x_i + y_i - 2 \sqrt{x_i y_i} \] \[ \text{Hellinger}^2(\vec{x},\vec{y}) = \tfrac12K_{\text{Hellinger}}(\vec{x},\vec{x}) + \tfrac12K_{\text{Hellinger}}(\vec{y},\vec{y}) - K_{\text{Hellinger}}(\vec{x},\vec{y}) = 1 - K_{\text{Hellinger}}(\vec{x},\vec{y}) \] which implies \(\text{Hellinger}(\vec{x},\vec{y}) \in [0;1]\), and is very similar to the Euclidean distance and the linear kernel.
From this, it follows trivially that Hellinger distance corresponds to the kernel transformation \(\phi:\vec{x}\mapsto(\tfrac12\sqrt{x_1},\ldots,\tfrac12\sqrt{x_d})\).
Deza and Deza unfortunately also give a second definition, as: \[\text{Hellinger-Deza}(\vec{x},\vec{y}):=\sqrt{2\sum\nolimits_i \left(\sqrt{\tfrac{x_i}{\bar{x}}}-\sqrt{\tfrac{y_i}{\bar{y}}}\right)^2}\] which has a built-in normalization, and a different scaling that is no longer bound to $[0;1]$. The 2 in this definition likely should be a \(\frac12\).
This distance is well suited for histograms, but it is then more efficient to once normalize the histograms, apply the square roots, and then use Euclidean distance (i.e., use the "kernel trick" in reverse, materializing the transformation \(\phi\) given above).
Reference:
E. Hellinger
Neue Begründung der Theorie quadratischer Formen von unendlichvielen Veränderlichen
Journal für die reine und angewandte MathematikM.-M. Deza, E. Deza
Dictionary of distancesTODO: support acceleration for sparse vectors.
TODO: add a second variant, with built-in normalization.
- Since:
- 0.7.0
- Author:
- Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
HellingerDistance.Par
Parameterization class.
-
Field Summary
Fields Modifier and Type Field Description private static java.lang.String
NON_NEGATIVE
Assertion error message.static HellingerDistance
STATIC
Static instance.
-
Constructor Summary
Constructors Constructor Description HellingerDistance()
Deprecated.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description double
distance(NumberVector fv1, NumberVector fv2)
Computes the distance between two given DatabaseObjects according to this distance function.boolean
equals(java.lang.Object obj)
SimpleTypeInformation<? super NumberVector>
getInputTypeRestriction()
Get the input data type of the function.int
hashCode()
<T extends NumberVector>
SpatialPrimitiveDistanceSimilarityQuery<T>instantiate(Relation<T> database)
Instantiate with a database to get the actual distance query.boolean
isMetric()
Is this distance function metric (satisfy the triangle inequality)boolean
isSymmetric()
Is this function symmetric?double
minDist(SpatialComparable mbr1, SpatialComparable mbr2)
Computes the distance between the two given MBRs according to this distance function.double
similarity(NumberVector o1, NumberVector o2)
Computes the similarity between two given DatabaseObjects according to this similarity function.-
Methods inherited from class elki.distance.AbstractNumberVectorDistance
dimensionality, dimensionality, dimensionality, dimensionality, dimensionality, dimensionality, dimensionality, dimensionality
-
-
-
-
Field Detail
-
STATIC
public static final HellingerDistance STATIC
Static instance.
-
NON_NEGATIVE
private static final java.lang.String NON_NEGATIVE
Assertion error message.- See Also:
- Constant Field Values
-
-
Constructor Detail
-
HellingerDistance
@Deprecated public HellingerDistance()
Deprecated.Hellinger kernel. Use static instanceSTATIC
!
-
-
Method Detail
-
distance
public double distance(NumberVector fv1, NumberVector fv2)
Description copied from interface:PrimitiveDistance
Computes the distance between two given DatabaseObjects according to this distance function.- Specified by:
distance
in interfaceNumberVectorDistance<NumberVector>
- Specified by:
distance
in interfacePrimitiveDistance<NumberVector>
- Parameters:
fv1
- first DatabaseObjectfv2
- second DatabaseObject- Returns:
- the distance between two given DatabaseObjects according to this distance function
-
minDist
public double minDist(SpatialComparable mbr1, SpatialComparable mbr2)
Description copied from interface:SpatialPrimitiveDistance
Computes the distance between the two given MBRs according to this distance function.- Specified by:
minDist
in interfaceSpatialPrimitiveDistance<NumberVector>
- Parameters:
mbr1
- the first MBR objectmbr2
- the second MBR object- Returns:
- the distance between the two given MBRs according to this distance function
-
similarity
public double similarity(NumberVector o1, NumberVector o2)
Description copied from interface:PrimitiveSimilarity
Computes the similarity between two given DatabaseObjects according to this similarity function.- Specified by:
similarity
in interfacePrimitiveSimilarity<NumberVector>
- Parameters:
o1
- first DatabaseObjecto2
- second DatabaseObject- Returns:
- the similarity between two given DatabaseObjects according to this similarity function
-
isSymmetric
public boolean isSymmetric()
Description copied from interface:Distance
Is this function symmetric?- Specified by:
isSymmetric
in interfaceDistance<NumberVector>
- Specified by:
isSymmetric
in interfaceSimilarity<NumberVector>
- Returns:
true
when symmetric
-
isMetric
public boolean isMetric()
Description copied from interface:Distance
Is this distance function metric (satisfy the triangle inequality)- Specified by:
isMetric
in interfaceDistance<NumberVector>
- Returns:
true
when metric.
-
instantiate
public <T extends NumberVector> SpatialPrimitiveDistanceSimilarityQuery<T> instantiate(Relation<T> database)
Description copied from interface:Distance
Instantiate with a database to get the actual distance query.- Specified by:
instantiate
in interfaceDistance<NumberVector>
- Specified by:
instantiate
in interfacePrimitiveDistance<NumberVector>
- Specified by:
instantiate
in interfacePrimitiveSimilarity<NumberVector>
- Specified by:
instantiate
in interfaceSimilarity<NumberVector>
- Specified by:
instantiate
in interfaceSpatialPrimitiveDistance<NumberVector>
- Parameters:
database
- The representation to use- Returns:
- Actual distance query.
-
getInputTypeRestriction
public SimpleTypeInformation<? super NumberVector> getInputTypeRestriction()
Description copied from interface:Distance
Get the input data type of the function.- Specified by:
getInputTypeRestriction
in interfaceDistance<NumberVector>
- Specified by:
getInputTypeRestriction
in interfacePrimitiveDistance<NumberVector>
- Specified by:
getInputTypeRestriction
in interfaceSimilarity<NumberVector>
- Overrides:
getInputTypeRestriction
in classAbstractNumberVectorDistance
- Returns:
- Type restriction
-
equals
public boolean equals(java.lang.Object obj)
- Overrides:
equals
in classjava.lang.Object
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classjava.lang.Object
-
-