normalized_weighted_distance min_ratio len_dist >= max_dist remove common prefix/suffix from strings fast distance calculation using string lengths in O(1) (>= levenshtein distance) one string empty? len_dist = abs(len1 - len2) compute max_dist = (len1 + len2) * (1.0 - min_ratio) char_dist = count uncommon chars between s1 and s2 distance calculation using uncommon chars in O(N) (>= levenshtein distance and = levenshtein distance of two sorted strings) char_dist >= max_dist levenshtein distance works without common prefix/suffix removing it is O(N) while Levenshtein is O(N*M) lev_dist = levenshtein distance lensum = len1 + len2 levenshtein distance in O(N*M) weighting: insert=1, delete=1, replace=2 return 0 return 0 return 1.0 - len_dist / lensum return 0 return 1.0 - lev_dist / lensum remove common prefix/suffix from strings when this was already done this only compares the first and last char A A calculates a normalized form of the levenshtein distance using the following costs: insert=1, delete=1, replace=2 >0 and <=100 true false >100 false false false true ==0