The Levenshtein Distance
It’s fascinating how every problem you can think of in computer science 9 times out of 10 was already tackled and solved by someone else. Today I was thinking about a way to suggest corrections for addresses inserted by our customers when I started to imagine an algorithm that used the mathematical notion of ‘distance’ (check ‘Metric (mathematics)’ on Wikipedia 1 ) to give them some suggestions based on a set of valid addresses present on our DB. Well, now I find out that it’s a well-known problem in computer science, that a wide range of algorithms2 has already been designed to solve it and that what I had in my mind was a variation of the Levenshtein distance!
A code snippet in Python that implements those ideas
from itertools import zip_longest
def strings_distance(x, y):
"""Compute a distance between strings x and y."""
return sum((0 if m == n else 1 for m, n in zip_longest(x, y)))
def suggestions(s, dictionary, distance=strings_distance, tol=4):
"""Use the distance function to search similar words inside the dictionary.
:param s: the string to get suggestions for.
:param dictionary: a set/list of strings.
:param distance: the function to use to compute the strings distance.
:param int tol: max distance allowed between the input string and the
suggestions.
"""
return {word for word in dictionary if distance(s, word) <= tol}