Distance Functions
ELKI release 0.8.0 includes the following distance functions
- Minkowski family:
- EuclideanDistance
- ManhattanDistance
- LPNormDistance
- LPIntegerNormDistance (optimized for integer p)
- MaximumDistance
- MinimumDistance
- SquaredEuclideanDistance
- Sparse optimized versions of Minkowski distances:
- Weighted versions of Minkowski distances:
- Angular distances:
- ArcCosineDistance – metric, expensive
- CosineDistance – not metric, cheapest
- SqrtCosineDistance – metric
-
ArcCosineUnitlengthDistance (optimized for data with x =1) -
CosineUnitlengthDistance (optimized for data with x =1) -
SqrtCosineUnitlengthDistance (optimized for data with x =1)
- BrayCurtisDistance
- CanberraDistance
- WeightedCanberraDistance
- ClarkDistance
- RandomStableDistance (pseudo-random distance)
- MahalanobisDistance (not via GUI/command line, needs a weight matrix)
- MatrixWeightedQuadraticDistance (not via GUI/command line, see color histogram distances)
- Adapters for similarity functions:
- Distances for probability distributions:
- ChiDistance
- ChiSquaredDistance
- FisherRaoDistance
- HellingerDistance
- JeffreyDivergenceDistance
- JensenShannonDivergenceDistance
- KullbackLeiblerDivergenceAsymmetricDistance
- KullbackLeiblerDivergenceReverseAsymmetricDistance
- SqrtJensenShannonDivergenceDistance
- TriangularDiscriminationDistance
- TriangularDistance
- Distance functions for 1-dimensional histograms:
- Color histogram distance functions:
- Correlation distance functions:
- Set-based distance functions (for binary data):
- String distance functions:
- Spatial distance functions (for geo data mining):
- External distance adapters (to access precomputed and externally computed distances):
- DiskCacheBasedDoubleDistance - binary cache
- DiskCacheBasedFloatDistance - binary cache
- FileBasedSparseDoubleDistance - ascii cache
- FileBasedSparseFloatDistance - ascii cache
- Subspace distance functions:
- Time series distance functions:
- Neighbor based distances:
- Distance functions for comparing clusters and clusterings:
Similarity Functions as Distances
Similarity functions usable through the adapter classes above include:
- FractionalSharedNearestNeighborSimilarity
- SharedNearestNeighborSimilarity
- Kulczynski1Similarity
- Kulczynski2Similarity
- Kernel functions
- Similarity functions for clusters and clusterings:
- Adapter for using distances as similarity functions:
- Distances also available as similarities:
Implementing custom distance funtions
When implementing custom distance functions, ask yourself the following questions first:
- Is it defined on the data itself (like euclidean distance) or on the instances (precomputed, external, second order distances)?
- What requirements does it have on the input data?
- What is the output data type?
Most likely, you will be implementing a NumberVectorDistance and can save yourself some work by deriving from AbstractNumberVectorDistance, for example for distances defined in coordinate vectors.
The Tutorial on writing a custom distance function takes you through all the steps needed for implementing a custom distance function.