normalized_weighted_distance
min_ratio
len_dist >= max_dist
remove common prefix/suffix
from strings
fast distance calculation using string lengths
in O(1) (>= levenshtein distance)
one string empty?
len_dist = abs(len1 - len2)
compute max_dist =
(len1 + len2) * (1.0 - min_ratio)
char_dist = count uncommon
chars between s1 and s2
distance calculation using uncommon chars in O(N)
(>= levenshtein distance and = levenshtein distance of two sorted strings)
char_dist >= max_dist
levenshtein distance works without common prefix/suffix
removing it is O(N) while Levenshtein is O(N*M)
lev_dist = levenshtein distance
lensum = len1 + len2
levenshtein distance in O(N*M)
weighting: insert=1, delete=1, replace=2
return 0
return 0
return 1.0 - len_dist / lensum
return 0
return 1.0 - lev_dist / lensum
remove common prefix/suffix
from strings
when this was already done this only compares the first and last char
A
A
calculates a normalized form of the levenshtein distance
using the following costs: insert=1, delete=1, replace=2
>0 and <=100
true
false
>100
false
false
false
true
==0