@Reference(authors="E. Hellinger",title="Neue Begr\u00fcndung der Theorie quadratischer Formen von unendlichvielen Ver\u00e4nderlichen",booktitle="Journal f\u00fcr die reine und angewandte Mathematik",url="http://resolver.sub.uni-goettingen.de/purl?GDZPPN002166941",bibkey="journals/mathematik/Hellinger1909") @Reference(authors="M.-M. Deza, E. Deza",title="Dictionary of distances",booktitle="Dictionary of distances",url="https://doi.org/10.1007/978-3-642-00234-2",bibkey="doi:10.1007/978-3-642-00234-2") @Alias(value={"hellinger","bhattacharyya"}) public class HellingerDistanceFunction extends AbstractNumberVectorDistanceFunction implements SpatialPrimitiveDistanceFunction<NumberVector>, NormalizedPrimitiveSimilarityFunction<NumberVector>
We assume vectors represent normalized probability distributions. Then \[\text{Hellinger}(\vec{x},\vec{y}):= \sqrt{\tfrac12\sum\nolimits_i \left(\sqrt{x_i}-\sqrt{y_i}\right)^2 } \]
The corresponding kernel / similarity is \[ K_{\text{Hellinger}}(\vec{x},\vec{y}) := \sum\nolimits_i \sqrt{x_i y_i} \]
If we have normalized probability distributions, we have the nice property that \( K_{\text{Hellinger}}(\vec{x},\vec{x}) = \sum\nolimits_i x_i = 1\). and therefore \( K_{\text{Hellinger}}(\vec{x},\vec{y}) \in [0:1] \).
Furthermore, we have the following relationship between this variant of the distance and this kernel: \[ \text{Hellinger}^2(\vec{x},\vec{y}) = \tfrac12\sum\nolimits_i \left(\sqrt{x_i}-\sqrt{y_i}\right)^2 = \tfrac12\sum\nolimits_i x_i + y_i - 2 \sqrt{x_i y_i} \] \[ \text{Hellinger}^2(\vec{x},\vec{y}) = \tfrac12K_{\text{Hellinger}}(\vec{x},\vec{x}) + \tfrac12K_{\text{Hellinger}}(\vec{y},\vec{y}) - K_{\text{Hellinger}}(\vec{x},\vec{y}) = 1 - K_{\text{Hellinger}}(\vec{x},\vec{y}) \] which implies \(\text{Hellinger}(\vec{x},\vec{y}) \in [0;1]\), and is very similar to the Euclidean distance and the linear kernel.
From this, it follows trivially that Hellinger distance corresponds to the kernel transformation \(\phi:\vec{x}\mapsto(\tfrac12\sqrt{x_1},\ldots,\tfrac12\sqrt{x_d})\).
Deza and Deza unfortunately also give a second definition, as: \[\text{Hellinger-Deza}(\vec{x},\vec{y}):=\sqrt{2\sum\nolimits_i \left(\sqrt{\tfrac{x_i}{\bar{x}}}-\sqrt{\tfrac{y_i}{\bar{y}}}\right)^2}\] which has a built-in normalization, and a different scaling that is no longer bound to $[0;1]$. The 2 in this definition likely should be a \(\frac12\).
This distance is well suited for histograms, but it is then more efficient to once normalize the histograms, apply the square roots, and then use Euclidean distance (i.e., use the "kernel trick" in reverse, materializing the transformation \(\phi\) given above).
Reference:
E. Hellinger
Neue Begründung der Theorie quadratischer Formen von unendlichvielen
Veränderlichen
Journal für die reine und angewandte Mathematik
M.-M. Deza, E. Deza
Dictionary of distances
TODO: support acceleration for sparse vectors.
TODO: add a second variant, with built-in normalization.
| Modifier and Type | Class and Description |
|---|---|
static class |
HellingerDistanceFunction.Parameterizer
Parameterization class.
|
| Modifier and Type | Field and Description |
|---|---|
private static java.lang.String |
NON_NEGATIVE
Assertion error message.
|
static HellingerDistanceFunction |
STATIC
Static instance.
|
| Constructor and Description |
|---|
HellingerDistanceFunction()
Deprecated.
|
| Modifier and Type | Method and Description |
|---|---|
double |
distance(NumberVector fv1,
NumberVector fv2)
Computes the distance between two given DatabaseObjects according to this
distance function.
|
boolean |
equals(java.lang.Object obj) |
SimpleTypeInformation<? super NumberVector> |
getInputTypeRestriction()
Get the input data type of the function.
|
int |
hashCode() |
<T extends NumberVector> |
instantiate(Relation<T> database)
Instantiate with a database to get the actual distance query.
|
boolean |
isMetric()
Is this distance function metric (satisfy the triangle inequality)
|
boolean |
isSymmetric()
Is this function symmetric?
|
double |
minDist(SpatialComparable mbr1,
SpatialComparable mbr2)
Computes the distance between the two given MBRs according to this distance
function.
|
double |
similarity(NumberVector o1,
NumberVector o2)
Computes the similarity between two given DatabaseObjects according to this
similarity function.
|
dimensionality, dimensionality, dimensionality, dimensionality, dimensionality, dimensionality, dimensionality, dimensionalityclone, finalize, getClass, notify, notifyAll, toString, wait, wait, waitisSquaredpublic static final HellingerDistanceFunction STATIC
private static final java.lang.String NON_NEGATIVE
@Deprecated public HellingerDistanceFunction()
STATIC!public double distance(NumberVector fv1, NumberVector fv2)
PrimitiveDistanceFunctiondistance in interface NumberVectorDistanceFunction<NumberVector>distance in interface PrimitiveDistanceFunction<NumberVector>fv1 - first DatabaseObjectfv2 - second DatabaseObjectpublic double minDist(SpatialComparable mbr1, SpatialComparable mbr2)
SpatialPrimitiveDistanceFunctionminDist in interface SpatialPrimitiveDistanceFunction<NumberVector>mbr1 - the first MBR objectmbr2 - the second MBR objectpublic double similarity(NumberVector o1, NumberVector o2)
PrimitiveSimilarityFunctionsimilarity in interface PrimitiveSimilarityFunction<NumberVector>o1 - first DatabaseObjecto2 - second DatabaseObjectpublic boolean isSymmetric()
DistanceFunctionisSymmetric in interface DistanceFunction<NumberVector>isSymmetric in interface SimilarityFunction<NumberVector>true when symmetricpublic boolean isMetric()
DistanceFunctionisMetric in interface DistanceFunction<NumberVector>true when metric.public <T extends NumberVector> SpatialPrimitiveDistanceSimilarityQuery<T> instantiate(Relation<T> database)
DistanceFunctioninstantiate in interface DistanceFunction<NumberVector>instantiate in interface PrimitiveDistanceFunction<NumberVector>instantiate in interface SpatialPrimitiveDistanceFunction<NumberVector>instantiate in interface PrimitiveSimilarityFunction<NumberVector>instantiate in interface SimilarityFunction<NumberVector>database - The representation to usepublic SimpleTypeInformation<? super NumberVector> getInputTypeRestriction()
DistanceFunctiongetInputTypeRestriction in interface DistanceFunction<NumberVector>getInputTypeRestriction in interface PrimitiveDistanceFunction<NumberVector>getInputTypeRestriction in interface SimilarityFunction<NumberVector>getInputTypeRestriction in class AbstractNumberVectorDistanceFunctionpublic boolean equals(java.lang.Object obj)
equals in class java.lang.Objectpublic int hashCode()
hashCode in class java.lang.ObjectCopyright © 2019 ELKI Development Team. License information.