Class HellingerDistance

  • All Implemented Interfaces:
    Distance<NumberVector>, NumberVectorDistance<NumberVector>, PrimitiveDistance<NumberVector>, SpatialPrimitiveDistance<NumberVector>, NormalizedPrimitiveSimilarity<NumberVector>, NormalizedSimilarity<NumberVector>, PrimitiveSimilarity<NumberVector>, Similarity<NumberVector>

    @Reference(authors="E. Hellinger",title="Neue Begr\u00fcndung der Theorie quadratischer Formen von unendlichvielen Ver\u00e4nderlichen",booktitle="Journal f\u00fcr die reine und angewandte Mathematik",url="http://resolver.sub.uni-goettingen.de/purl?GDZPPN002166941",bibkey="journals/mathematik/Hellinger1909") @Reference(authors="M.-M. Deza, E. Deza",title="Dictionary of distances",booktitle="Dictionary of distances",url="https://doi.org/10.1007/978-3-642-00234-2",bibkey="doi:10.1007/978-3-642-00234-2")
    @Alias({"hellinger","bhattacharyya"})
    public class HellingerDistance
    extends AbstractNumberVectorDistance
    implements SpatialPrimitiveDistance<NumberVector>, NormalizedPrimitiveSimilarity<NumberVector>
    Hellinger metric / affinity / kernel, Bhattacharyya coefficient, fidelity similarity, Matusita distance, Hellinger-Kakutani metric on a probability distribution.

    We assume vectors represent normalized probability distributions. Then \[\text{Hellinger}(\vec{x},\vec{y}):= \sqrt{\tfrac12\sum\nolimits_i \left(\sqrt{x_i}-\sqrt{y_i}\right)^2 } \]

    The corresponding kernel / similarity is \[ K_{\text{Hellinger}}(\vec{x},\vec{y}) := \sum\nolimits_i \sqrt{x_i y_i} \]

    If we have normalized probability distributions, we have the nice property that \( K_{\text{Hellinger}}(\vec{x},\vec{x}) = \sum\nolimits_i x_i = 1\). and therefore \( K_{\text{Hellinger}}(\vec{x},\vec{y}) \in [0:1] \).

    Furthermore, we have the following relationship between this variant of the distance and this kernel: \[ \text{Hellinger}^2(\vec{x},\vec{y}) = \tfrac12\sum\nolimits_i \left(\sqrt{x_i}-\sqrt{y_i}\right)^2 = \tfrac12\sum\nolimits_i x_i + y_i - 2 \sqrt{x_i y_i} \] \[ \text{Hellinger}^2(\vec{x},\vec{y}) = \tfrac12K_{\text{Hellinger}}(\vec{x},\vec{x}) + \tfrac12K_{\text{Hellinger}}(\vec{y},\vec{y}) - K_{\text{Hellinger}}(\vec{x},\vec{y}) = 1 - K_{\text{Hellinger}}(\vec{x},\vec{y}) \] which implies \(\text{Hellinger}(\vec{x},\vec{y}) \in [0;1]\), and is very similar to the Euclidean distance and the linear kernel.

    From this, it follows trivially that Hellinger distance corresponds to the kernel transformation \(\phi:\vec{x}\mapsto(\tfrac12\sqrt{x_1},\ldots,\tfrac12\sqrt{x_d})\).

    Deza and Deza unfortunately also give a second definition, as: \[\text{Hellinger-Deza}(\vec{x},\vec{y}):=\sqrt{2\sum\nolimits_i \left(\sqrt{\tfrac{x_i}{\bar{x}}}-\sqrt{\tfrac{y_i}{\bar{y}}}\right)^2}\] which has a built-in normalization, and a different scaling that is no longer bound to $[0;1]$. The 2 in this definition likely should be a \(\frac12\).

    This distance is well suited for histograms, but it is then more efficient to once normalize the histograms, apply the square roots, and then use Euclidean distance (i.e., use the "kernel trick" in reverse, materializing the transformation \(\phi\) given above).

    Reference:

    E. Hellinger
    Neue Begründung der Theorie quadratischer Formen von unendlichvielen Veränderlichen
    Journal für die reine und angewandte Mathematik

    M.-M. Deza, E. Deza
    Dictionary of distances

    TODO: support acceleration for sparse vectors.

    TODO: add a second variant, with built-in normalization.

    Since:
    0.7.0
    Author:
    Erich Schubert