Class LevenshteinDetailedDistance

    • Constructor Summary

      Constructors 
      Constructor Description
      LevenshteinDetailedDistance()
      This returns the default instance that uses a version of the algorithm that does not use a threshold parameter.
      LevenshteinDetailedDistance​(java.lang.Integer threshold)
      If the threshold is not null, distance calculations will be limited to a maximum length.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      LevenshteinResults apply​(java.lang.CharSequence left, java.lang.CharSequence right)
      Find the Levenshtein distance between two Strings.
      private static LevenshteinResults findDetailedResults​(java.lang.CharSequence left, java.lang.CharSequence right, int[][] matrix, boolean swapped)
      Finds count for each of the three [insert, delete, substitute] operations needed.
      static LevenshteinDetailedDistance getDefaultInstance()
      Gets the default instance.
      java.lang.Integer getThreshold()
      Gets the distance threshold.
      private static LevenshteinResults limitedCompare​(java.lang.CharSequence left, java.lang.CharSequence right, int threshold)
      Find the Levenshtein distance between two CharSequences if it's less than or equal to a given threshold.
      private static LevenshteinResults unlimitedCompare​(java.lang.CharSequence left, java.lang.CharSequence right)
      Find the Levenshtein distance between two Strings.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • threshold

        private final java.lang.Integer threshold
        Threshold.
    • Constructor Detail

      • LevenshteinDetailedDistance

        public LevenshteinDetailedDistance()

        This returns the default instance that uses a version of the algorithm that does not use a threshold parameter.

        See Also:
        getDefaultInstance()
      • LevenshteinDetailedDistance

        public LevenshteinDetailedDistance​(java.lang.Integer threshold)
        If the threshold is not null, distance calculations will be limited to a maximum length.

        If the threshold is null, the unlimited version of the algorithm will be used.

        Parameters:
        threshold - If this is null then distances calculations will not be limited. This may not be negative.
    • Method Detail

      • apply

        public LevenshteinResults apply​(java.lang.CharSequence left,
                                        java.lang.CharSequence right)

        Find the Levenshtein distance between two Strings.

        A higher score indicates a greater distance.

        The previous implementation of the Levenshtein distance algorithm was from http://www.merriampark.com/ld.htm

        Chas Emerick has written an implementation in Java, which avoids an OutOfMemoryError which can occur when my Java implementation is used with very large strings.
        This implementation of the Levenshtein distance algorithm is from http://www.merriampark.com/ldjava.htm

         distance.apply(null, *)             = IllegalArgumentException
         distance.apply(*, null)             = IllegalArgumentException
         distance.apply("","")               = 0
         distance.apply("","a")              = 1
         distance.apply("aaapppp", "")       = 7
         distance.apply("frog", "fog")       = 1
         distance.apply("fly", "ant")        = 3
         distance.apply("elephant", "hippo") = 7
         distance.apply("hippo", "elephant") = 7
         distance.apply("hippo", "zzzzzzzz") = 8
         distance.apply("hello", "hallo")    = 1
         
        Specified by:
        apply in interface EditDistance<LevenshteinResults>
        Specified by:
        apply in interface SimilarityScore<LevenshteinResults>
        Parameters:
        left - the first string, must not be null
        right - the second string, must not be null
        Returns:
        result distance, or -1
        Throws:
        java.lang.IllegalArgumentException - if either String input null
      • getDefaultInstance

        public static LevenshteinDetailedDistance getDefaultInstance()
        Gets the default instance.
        Returns:
        The default instace
      • getThreshold

        public java.lang.Integer getThreshold()
        Gets the distance threshold.
        Returns:
        The distance threshold
      • limitedCompare

        private static LevenshteinResults limitedCompare​(java.lang.CharSequence left,
                                                         java.lang.CharSequence right,
                                                         int threshold)
        Find the Levenshtein distance between two CharSequences if it's less than or equal to a given threshold.

        This implementation follows from Algorithms on Strings, Trees and Sequences by Dan Gusfield and Chas Emerick's implementation of the Levenshtein distance algorithm from http://www.merriampark.com/ld.htm

         limitedCompare(null, *, *)             = IllegalArgumentException
         limitedCompare(*, null, *)             = IllegalArgumentException
         limitedCompare(*, *, -1)               = IllegalArgumentException
         limitedCompare("","", 0)               = 0
         limitedCompare("aaapppp", "", 8)       = 7
         limitedCompare("aaapppp", "", 7)       = 7
         limitedCompare("aaapppp", "", 6))      = -1
         limitedCompare("elephant", "hippo", 7) = 7
         limitedCompare("elephant", "hippo", 6) = -1
         limitedCompare("hippo", "elephant", 7) = 7
         limitedCompare("hippo", "elephant", 6) = -1
         
        Parameters:
        left - the first CharSequence, must not be null
        right - the second CharSequence, must not be null
        threshold - the target threshold, must not be negative
        Returns:
        result distance, or -1
      • unlimitedCompare

        private static LevenshteinResults unlimitedCompare​(java.lang.CharSequence left,
                                                           java.lang.CharSequence right)

        Find the Levenshtein distance between two Strings.

        A higher score indicates a greater distance.

        The previous implementation of the Levenshtein distance algorithm was from http://www.merriampark.com/ld.htm

        Chas Emerick has written an implementation in Java, which avoids an OutOfMemoryError which can occur when my Java implementation is used with very large strings.
        This implementation of the Levenshtein distance algorithm is from http://www.merriampark.com/ldjava.htm

         unlimitedCompare(null, *)             = IllegalArgumentException
         unlimitedCompare(*, null)             = IllegalArgumentException
         unlimitedCompare("","")               = 0
         unlimitedCompare("","a")              = 1
         unlimitedCompare("aaapppp", "")       = 7
         unlimitedCompare("frog", "fog")       = 1
         unlimitedCompare("fly", "ant")        = 3
         unlimitedCompare("elephant", "hippo") = 7
         unlimitedCompare("hippo", "elephant") = 7
         unlimitedCompare("hippo", "zzzzzzzz") = 8
         unlimitedCompare("hello", "hallo")    = 1
         
        Parameters:
        left - the first CharSequence, must not be null
        right - the second CharSequence, must not be null
        Returns:
        result distance, or -1
        Throws:
        java.lang.IllegalArgumentException - if either CharSequence input is null
      • findDetailedResults

        private static LevenshteinResults findDetailedResults​(java.lang.CharSequence left,
                                                              java.lang.CharSequence right,
                                                              int[][] matrix,
                                                              boolean swapped)
        Finds count for each of the three [insert, delete, substitute] operations needed. This is based on the matrix formed based on the two character sequence.
        Parameters:
        left - character sequence which need to be converted from
        right - character sequence which need to be converted to
        matrix - two dimensional array containing
        swapped - tells whether the value for left character sequence and right character sequence were swapped to save memory
        Returns:
        result object containing the count of insert, delete and substitute and total count needed