Class LevenshteinDistance

  • All Implemented Interfaces:
    Distance<java.lang.String>, PrimitiveDistance<java.lang.String>

    @Description("Levenshtein distance.")
    @Reference(authors="V. I. Levenshtein",
               title="Binary codes capable of correcting deletions, insertions and reversals",
               booktitle="Soviet physics doklady 10",
               bibkey="journals/misc/Levenshtein66")
    public class LevenshteinDistance
    extends java.lang.Object
    implements PrimitiveDistance<java.lang.String>
    Classic Levenshtein distance on strings.

    Reference:

    V. I. Levenshtein
    Binary codes capable of correcting deletions, insertions and reversals
    Soviet physics doklady 10

    TODO: add case insensitive flag.

    TODO: add an API that can stop early at a maximum distance

    Since:
    0.6.0
    Author:
    Felix Stahlberg, Erich Schubert
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  LevenshteinDistance.Par
      Parameterization class.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      double distance​(java.lang.String o1, java.lang.String o2)
      Computes the distance between two given DatabaseObjects according to this distance function.
      boolean equals​(java.lang.Object obj)  
      SimpleTypeInformation<? super java.lang.String> getInputTypeRestriction()
      Get the input data type of the function.
      int hashCode()  
      boolean isMetric()
      Is this distance function metric (satisfy the triangle inequality)
      static int levenshteinDistance​(java.lang.String o1, java.lang.String o2)
      Levenshtein distance for two strings.
      static int levenshteinDistance​(java.lang.String o1, java.lang.String o2, int prefix, int postfix)
      Compute the Levenshtein distance, except for prefix and postfix.
      private static int min​(int a, int b, int c)
      Three-way integer minimum.
      private static int postfixLen​(java.lang.String o1, java.lang.String o2, int prefix)
      Compute the postfix length.
      private static int prefixLen​(java.lang.String o1, java.lang.String o2)
      Compute the length of the prefix.
      • Methods inherited from class java.lang.Object

        clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • STATIC_SENSITIVE

        public static final LevenshteinDistance STATIC_SENSITIVE
        Static instance, case sensitive.
    • Constructor Detail

      • LevenshteinDistance

        @Deprecated
        public LevenshteinDistance()
        Deprecated.
        Constructor. Use static instance instead.
    • Method Detail

      • distance

        public double distance​(java.lang.String o1,
                               java.lang.String o2)
        Description copied from interface: PrimitiveDistance
        Computes the distance between two given DatabaseObjects according to this distance function.
        Specified by:
        distance in interface PrimitiveDistance<java.lang.String>
        Parameters:
        o1 - first DatabaseObject
        o2 - second DatabaseObject
        Returns:
        the distance between two given DatabaseObjects according to this distance function
      • levenshteinDistance

        public static int levenshteinDistance​(java.lang.String o1,
                                              java.lang.String o2)
        Levenshtein distance for two strings.
        Parameters:
        o1 - First string
        o2 - Second string
        Returns:
        Levenshtein distance
      • prefixLen

        private static int prefixLen​(java.lang.String o1,
                                     java.lang.String o2)
        Compute the length of the prefix.
        Parameters:
        o1 - First string
        o2 - Second string
        Returns:
        Prefix length
      • postfixLen

        private static int postfixLen​(java.lang.String o1,
                                      java.lang.String o2,
                                      int prefix)
        Compute the postfix length.
        Parameters:
        o1 - First object
        o2 - Second object
        prefix - Known prefix length
        Returns:
        Postfix length
      • levenshteinDistance

        public static int levenshteinDistance​(java.lang.String o1,
                                              java.lang.String o2,
                                              int prefix,
                                              int postfix)
        Compute the Levenshtein distance, except for prefix and postfix.
        Parameters:
        o1 - First object
        o2 - Second object
        prefix - Prefix length
        postfix - Postfix length
        Returns:
        Levenshtein distance
      • min

        private static int min​(int a,
                               int b,
                               int c)
        Three-way integer minimum.
        Parameters:
        a - First value
        b - Second value
        c - Third value.
        Returns:
        Minimum
      • isMetric

        public boolean isMetric()
        Description copied from interface: Distance
        Is this distance function metric (satisfy the triangle inequality)
        Specified by:
        isMetric in interface Distance<java.lang.String>
        Returns:
        true when metric.
      • equals

        public boolean equals​(java.lang.Object obj)
        Overrides:
        equals in class java.lang.Object
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class java.lang.Object