The Levenshtein Distance
Jan 3, 2020
2 minute read

    It’s fascinating how every problem you can think of in computer science 9 times out of 10 was already tackled and solved by someone else. Today I was thinking about a way to suggest corrections for addresses inserted by our customers when I started to imagine an algorithm that used the mathematical notion of ‘distance’ (check ‘Metric (mathematics)’ on Wikipedia 1 ) to give them some suggestions based on a set of valid addresses present on our DB. Well, now I find out that it’s a well-known problem in computer science, that a wide range of algorithms2 has already been designed to solve it and that what I had in my mind was a variation of the Levenshtein distance!

    A code snippet in Python that implements those ideas

    from itertools import zip_longest
    
    
    def strings_distance(x, y):
        """Compute a distance between strings x and y."""
        return sum((0 if m == n else 1 for m, n in zip_longest(x, y)))
    
    
    def suggestions(s, dictionary, distance=strings_distance, tol=4):
        """Use the distance function to search similar words inside the dictionary.
    
        :param s: the string to get suggestions for.
        :param dictionary: a set/list of strings.
        :param distance: the function to use to compute the strings distance.
        :param int tol: max distance allowed between the input string and the
        suggestions.
        """
        return {word for word in dictionary if distance(s, word) <= tol}