de.lmu.ifi.dbs.elki.distance.distancefunction.probabilistic

## Class HellingerDistanceFunction

• All Implemented Interfaces:
DistanceFunction<NumberVector>, NumberVectorDistanceFunction<NumberVector>, PrimitiveDistanceFunction<NumberVector>, SpatialPrimitiveDistanceFunction<NumberVector>, NormalizedPrimitiveSimilarityFunction<NumberVector>, NormalizedSimilarityFunction<NumberVector>, PrimitiveSimilarityFunction<NumberVector>, SimilarityFunction<NumberVector>

@Reference(authors="E. Hellinger",title="Neue Begr\u00fcndung der Theorie quadratischer Formen von unendlichvielen Ver\u00e4nderlichen",booktitle="Journal f\u00fcr die reine und angewandte Mathematik",url="http://resolver.sub.uni-goettingen.de/purl?GDZPPN002166941",bibkey="journals/mathematik/Hellinger1909") @Reference(authors="M.-M. Deza, E. Deza",title="Dictionary of distances",booktitle="Dictionary of distances",url="https://doi.org/10.1007/978-3-642-00234-2",bibkey="doi:10.1007/978-3-642-00234-2")
@Alias(value={"hellinger","bhattacharyya"})
public class HellingerDistanceFunction
extends AbstractNumberVectorDistanceFunction
implements SpatialPrimitiveDistanceFunction<NumberVector>, NormalizedPrimitiveSimilarityFunction<NumberVector>
Hellinger metric / affinity / kernel, Bhattacharyya coefficient, fidelity similarity, Matusita distance, Hellinger-Kakutani metric on a probability distribution.

We assume vectors represent normalized probability distributions. Then $\text{Hellinger}(\vec{x},\vec{y}):= \sqrt{\tfrac12\sum\nolimits_i \left(\sqrt{x_i}-\sqrt{y_i}\right)^2 }$

The corresponding kernel / similarity is $K_{\text{Hellinger}}(\vec{x},\vec{y}) := \sum\nolimits_i \sqrt{x_i y_i}$

If we have normalized probability distributions, we have the nice property that $$K_{\text{Hellinger}}(\vec{x},\vec{x}) = \sum\nolimits_i x_i = 1$$. and therefore $$K_{\text{Hellinger}}(\vec{x},\vec{y}) \in [0:1]$$.

Furthermore, we have the following relationship between this variant of the distance and this kernel: $\text{Hellinger}^2(\vec{x},\vec{y}) = \tfrac12\sum\nolimits_i \left(\sqrt{x_i}-\sqrt{y_i}\right)^2 = \tfrac12\sum\nolimits_i x_i + y_i - 2 \sqrt{x_i y_i}$ $\text{Hellinger}^2(\vec{x},\vec{y}) = \tfrac12K_{\text{Hellinger}}(\vec{x},\vec{x}) + \tfrac12K_{\text{Hellinger}}(\vec{y},\vec{y}) - K_{\text{Hellinger}}(\vec{x},\vec{y}) = 1 - K_{\text{Hellinger}}(\vec{x},\vec{y})$ which implies $$\text{Hellinger}(\vec{x},\vec{y}) \in [0;1]$$, and is very similar to the Euclidean distance and the linear kernel.

From this, it follows trivially that Hellinger distance corresponds to the kernel transformation $$\phi:\vec{x}\mapsto(\tfrac12\sqrt{x_1},\ldots,\tfrac12\sqrt{x_d})$$.

Deza and Deza unfortunately also give a second definition, as: $\text{Hellinger-Deza}(\vec{x},\vec{y}):=\sqrt{2\sum\nolimits_i \left(\sqrt{\tfrac{x_i}{\bar{x}}}-\sqrt{\tfrac{y_i}{\bar{y}}}\right)^2}$ which has a built-in normalization, and a different scaling that is no longer bound to $[0;1]$. The 2 in this definition likely should be a $$\frac12$$.

This distance is well suited for histograms, but it is then more efficient to once normalize the histograms, apply the square roots, and then use Euclidean distance (i.e., use the "kernel trick" in reverse, materializing the transformation $$\phi$$ given above).

Reference:

E. Hellinger
Neue Begründung der Theorie quadratischer Formen von unendlichvielen Veränderlichen
Journal für die reine und angewandte Mathematik

M.-M. Deza, E. Deza
Dictionary of distances

TODO: support acceleration for sparse vectors.

TODO: add a second variant, with built-in normalization.

Since:
0.7.0
Author:
Erich Schubert
• ### Nested Class Summary

Nested Classes
Modifier and Type Class and Description
static class  HellingerDistanceFunction.Parameterizer
Parameterization class.
• ### Field Summary

Fields
Modifier and Type Field and Description
private static java.lang.String NON_NEGATIVE
Assertion error message.
static HellingerDistanceFunction STATIC
Static instance.
• ### Constructor Summary

Constructors
Constructor and Description
HellingerDistanceFunction()
Deprecated.
• ### Method Summary

All Methods
Modifier and Type Method and Description
double distance(NumberVector fv1, NumberVector fv2)
Computes the distance between two given DatabaseObjects according to this distance function.
boolean equals(java.lang.Object obj)
SimpleTypeInformation<? super NumberVector> getInputTypeRestriction()
Get the input data type of the function.
int hashCode()
<T extends NumberVector>SpatialPrimitiveDistanceSimilarityQuery<T> instantiate(Relation<T> database)
Instantiate with a database to get the actual distance query.
boolean isMetric()
Is this distance function metric (satisfy the triangle inequality)
boolean isSymmetric()
Is this function symmetric?
double minDist(SpatialComparable mbr1, SpatialComparable mbr2)
Computes the distance between the two given MBRs according to this distance function.
double similarity(NumberVector o1, NumberVector o2)
Computes the similarity between two given DatabaseObjects according to this similarity function.
• ### Methods inherited from class de.lmu.ifi.dbs.elki.distance.distancefunction.AbstractNumberVectorDistanceFunction

dimensionality, dimensionality, dimensionality, dimensionality, dimensionality, dimensionality, dimensionality, dimensionality
• ### Methods inherited from class java.lang.Object

clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
• ### Methods inherited from interface de.lmu.ifi.dbs.elki.distance.distancefunction.DistanceFunction

isSquared
• ### Field Detail

• #### STATIC

public static final HellingerDistanceFunction STATIC
Static instance.
• #### NON_NEGATIVE

private static final java.lang.String NON_NEGATIVE
Assertion error message.
Constant Field Values
• ### Constructor Detail

• #### HellingerDistanceFunction

@Deprecated
public HellingerDistanceFunction()
Deprecated.
Hellinger kernel. Use static instance STATIC!
• ### Method Detail

• #### distance

public double distance(NumberVector fv1,
NumberVector fv2)
Description copied from interface: PrimitiveDistanceFunction
Computes the distance between two given DatabaseObjects according to this distance function.
Specified by:
distance in interface NumberVectorDistanceFunction<NumberVector>
Specified by:
distance in interface PrimitiveDistanceFunction<NumberVector>
Parameters:
fv1 - first DatabaseObject
fv2 - second DatabaseObject
Returns:
the distance between two given DatabaseObjects according to this distance function
• #### minDist

public double minDist(SpatialComparable mbr1,
SpatialComparable mbr2)
Description copied from interface: SpatialPrimitiveDistanceFunction
Computes the distance between the two given MBRs according to this distance function.
Specified by:
minDist in interface SpatialPrimitiveDistanceFunction<NumberVector>
Parameters:
mbr1 - the first MBR object
mbr2 - the second MBR object
Returns:
the distance between the two given MBRs according to this distance function
• #### similarity

public double similarity(NumberVector o1,
NumberVector o2)
Description copied from interface: PrimitiveSimilarityFunction
Computes the similarity between two given DatabaseObjects according to this similarity function.
Specified by:
similarity in interface PrimitiveSimilarityFunction<NumberVector>
Parameters:
o1 - first DatabaseObject
o2 - second DatabaseObject
Returns:
the similarity between two given DatabaseObjects according to this similarity function
• #### isSymmetric

public boolean isSymmetric()
Description copied from interface: DistanceFunction
Is this function symmetric?
Specified by:
isSymmetric in interface DistanceFunction<NumberVector>
Specified by:
isSymmetric in interface SimilarityFunction<NumberVector>
Returns:
true when symmetric
• #### isMetric

public boolean isMetric()
Description copied from interface: DistanceFunction
Is this distance function metric (satisfy the triangle inequality)
Specified by:
isMetric in interface DistanceFunction<NumberVector>
Returns:
true when metric.
• #### instantiate

public <T extends NumberVector> SpatialPrimitiveDistanceSimilarityQuery<T> instantiate(Relation<T> database)
Description copied from interface: DistanceFunction
Instantiate with a database to get the actual distance query.
Specified by:
instantiate in interface DistanceFunction<NumberVector>
Specified by:
instantiate in interface PrimitiveDistanceFunction<NumberVector>
Specified by:
instantiate in interface SpatialPrimitiveDistanceFunction<NumberVector>
Specified by:
instantiate in interface PrimitiveSimilarityFunction<NumberVector>
Specified by:
instantiate in interface SimilarityFunction<NumberVector>
Parameters:
database - The representation to use
Returns:
Actual distance query.
• #### getInputTypeRestriction

public SimpleTypeInformation<? super NumberVector> getInputTypeRestriction()
Description copied from interface: DistanceFunction
Get the input data type of the function.
Specified by:
getInputTypeRestriction in interface DistanceFunction<NumberVector>
Specified by:
getInputTypeRestriction in interface PrimitiveDistanceFunction<NumberVector>
Specified by:
getInputTypeRestriction in interface SimilarityFunction<NumberVector>
Overrides:
getInputTypeRestriction in class AbstractNumberVectorDistanceFunction
Returns:
Type restriction
• #### equals

public boolean equals(java.lang.Object obj)
Overrides:
equals in class java.lang.Object
• #### hashCode

public int hashCode()
Overrides:
hashCode in class java.lang.Object