Conventional database searching technology typically requires exact matches to find database records. When the input string and the records searched do not match exactly a record search will not turn up a desired result. This is frustrating to users and detrimental to efficient business operations.
Record matching is an essential step in order to use databases for various business purposes. For example, various errors or ambiguities in data may cause a customer name in a sales record not to match exactly with the name of the same customer in a database table. A critical component of record matching involves determining whether two strings are similar or not. String similarity is typically captured via a similarity function that, given a pair of strings, returns a number between 0 and 1, a higher value indicating a greater degree of similarity with the value 1 corresponding to equality. There are many known similarity functions for computing the similarity of strings, such as, for example, Jaro distance, Jaro Winkler distance, Hamming distance, Jaccard similarity, cosine similarity, Euclidean distance, Levenstein distance, Smith-Waterman distance and Hellinger distance, just to name a few.