de.lmu.ifi.dbs.elki.distance.distancefunction.set

## Class JaccardSimilarityDistanceFunction

• All Implemented Interfaces:
DistanceFunction<FeatureVector<?>>, NumberVectorDistanceFunction<FeatureVector<?>>, PrimitiveDistanceFunction<FeatureVector<?>>, NormalizedPrimitiveSimilarityFunction<FeatureVector<?>>, NormalizedSimilarityFunction<FeatureVector<?>>, PrimitiveSimilarityFunction<FeatureVector<?>>, SimilarityFunction<FeatureVector<?>>

@Reference(authors="P. Jaccard",
title="Distribution de la florine alpine dans la Bassin de Dranses et dans quelques regiones voisines",
booktitle="Bulletin del la Soci\u00e9t\u00e9 Vaudoise des Sciences Naturelles",
url="http://data.rero.ch/01-R241574160",
bibkey="journals/misc/Jaccard1902")
@Alias(value="de.lmu.ifi.dbs.elki.distance.similarityfunction.JaccardPrimitiveSimilarityFunction")
public class JaccardSimilarityDistanceFunction
extends AbstractSetDistanceFunction<FeatureVector<?>>
implements NormalizedPrimitiveSimilarityFunction<FeatureVector<?>>, NumberVectorDistanceFunction<FeatureVector<?>>, PrimitiveDistanceFunction<FeatureVector<?>>
A flexible extension of Jaccard similarity to non-binary vectors.

Jaccard coefficient is commonly defined as $$\frac{A\cap B}{A\cup B}$$.

We can extend this definition to non-binary vectors as follows: $$\tfrac{|\{i\mid a_i = b_i\}|}{|\{i\mid a_i = 0 \wedge b_i = 0\}|}$$

For binary vectors, this will obviously be the same quantity. However, this version is more useful for categorical data.

Reference:

P. Jaccard
Distribution de la florine alpine dans la Bassin de Dranses et dans quelques regiones voisines
Bulletin del la Société Vaudoise des Sciences Naturelles

Since:
0.6.0
Author:
Erich Schubert

• ### Fields inherited from class de.lmu.ifi.dbs.elki.distance.distancefunction.set.AbstractSetDistanceFunction

DOUBLE_NULL, INTEGER_NULL, STRING_NULL
• ### Constructor Summary

Constructors
Constructor and Description
JaccardSimilarityDistanceFunction()
Constructor.
• ### Method Summary

All Methods
Modifier and Type Method and Description
double distance(FeatureVector<?> o1, FeatureVector<?> o2)
Computes the distance between two given DatabaseObjects according to this distance function.
double distance(NumberVector o1, NumberVector o2)
Computes the distance between two given vectors according to this distance function.
boolean equals(java.lang.Object obj)
SimpleTypeInformation<? super FeatureVector<?>> getInputTypeRestriction()
Get the input data type of the function.
int hashCode()
<T extends FeatureVector<?>>DistanceSimilarityQuery<T> instantiate(Relation<T> relation)
Instantiate with a representation to get the actual similarity query.
boolean isMetric()
Is this distance function metric (satisfy the triangle inequality)
boolean isSymmetric()
Is this function symmetric?
double similarity(FeatureVector<?> o1, FeatureVector<?> o2)
Computes the similarity between two given DatabaseObjects according to this similarity function.
static double similarityNumberVector(NumberVector o1, NumberVector o2)
Compute Jaccard similarity for two number vectors.
• ### Methods inherited from class de.lmu.ifi.dbs.elki.distance.distancefunction.set.AbstractSetDistanceFunction

isNull
• ### Methods inherited from class java.lang.Object

clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
• ### Methods inherited from interface de.lmu.ifi.dbs.elki.distance.distancefunction.DistanceFunction

isSquared
• ### Constructor Detail

• #### JaccardSimilarityDistanceFunction

public JaccardSimilarityDistanceFunction()
Constructor. No parameters.
• ### Method Detail

• #### similarity

public double similarity(FeatureVector<?> o1,
FeatureVector<?> o2)
Description copied from interface: PrimitiveSimilarityFunction
Computes the similarity between two given DatabaseObjects according to this similarity function.
Specified by:
similarity in interface PrimitiveSimilarityFunction<FeatureVector<?>>
Parameters:
o1 - first DatabaseObject
o2 - second DatabaseObject
Returns:
the similarity between two given DatabaseObjects according to this similarity function
• #### similarityNumberVector

public static double similarityNumberVector(NumberVector o1,
NumberVector o2)
Compute Jaccard similarity for two number vectors.
Parameters:
o1 - First vector
o2 - Second vector
Returns:
Jaccard similarity
• #### distance

public double distance(FeatureVector<?> o1,
FeatureVector<?> o2)
Description copied from interface: PrimitiveDistanceFunction
Computes the distance between two given DatabaseObjects according to this distance function.
Specified by:
distance in interface PrimitiveDistanceFunction<FeatureVector<?>>
Parameters:
o1 - first DatabaseObject
o2 - second DatabaseObject
Returns:
the distance between two given DatabaseObjects according to this distance function
• #### distance

public double distance(NumberVector o1,
NumberVector o2)
Description copied from interface: NumberVectorDistanceFunction
Computes the distance between two given vectors according to this distance function.
Specified by:
distance in interface NumberVectorDistanceFunction<FeatureVector<?>>
Parameters:
o1 - first vector
o2 - second vector
Returns:
the distance between two given vectors according to this distance function
• #### isSymmetric

public boolean isSymmetric()
Description copied from interface: SimilarityFunction
Is this function symmetric?
Specified by:
isSymmetric in interface DistanceFunction<FeatureVector<?>>
Specified by:
isSymmetric in interface SimilarityFunction<FeatureVector<?>>
Returns:
true when symmetric
• #### isMetric

public boolean isMetric()
Description copied from interface: DistanceFunction
Is this distance function metric (satisfy the triangle inequality)
Specified by:
isMetric in interface DistanceFunction<FeatureVector<?>>
Returns:
true when metric.
• #### getInputTypeRestriction

public SimpleTypeInformation<? super FeatureVector<?>> getInputTypeRestriction()
Description copied from interface: SimilarityFunction
Get the input data type of the function.
Specified by:
getInputTypeRestriction in interface DistanceFunction<FeatureVector<?>>
Specified by:
getInputTypeRestriction in interface PrimitiveDistanceFunction<FeatureVector<?>>
Specified by:
getInputTypeRestriction in interface SimilarityFunction<FeatureVector<?>>
Returns:
Type restriction
• #### instantiate

public <T extends FeatureVector<?>> DistanceSimilarityQuery<T> instantiate(Relation<T> relation)
Description copied from interface: SimilarityFunction
Instantiate with a representation to get the actual similarity query.
Specified by:
instantiate in interface DistanceFunction<FeatureVector<?>>
Specified by:
instantiate in interface PrimitiveDistanceFunction<FeatureVector<?>>
Specified by:
instantiate in interface PrimitiveSimilarityFunction<FeatureVector<?>>
Specified by:
instantiate in interface SimilarityFunction<FeatureVector<?>>
Parameters:
relation - Representation to use
Returns:
Actual distance query.
• #### equals

public boolean equals(java.lang.Object obj)
Overrides:
equals in class java.lang.Object
• #### hashCode

public int hashCode()
Overrides:
hashCode in class java.lang.Object