1. Field of the Invention
The present invention concerns digital data processing software and/or hardware to quickly yet accurately determine if a given computer-readable record is represented, by exact match or pretty close match, in an existing collection of computer-readable records.
2. Description of the Related Art
“Fuzzy matching” refers to a well known assortment of techniques to determine whether searched strings approximately match some given pattern string. These techniques are also known by other names such as approximate matching, inexact matching, fuzzy string searching, etc. Each implementation of fuzzy matching uses some similarity function, that is, an algorithm for determining whether the input and searched strings are similar to each other. One common similarity function is Levenshtein distance, and another is n-gram distance.
The commercial market already contains various products that employ fuzzy matching. One example is the Hunter software of Experian, which is intended to detect fraud in the customer acquisition process. Another example is found in the products of Identity Systems, formerly known as Search Software America, which provides various software products aimed at searching, finding, matching, and grouping identity data, regardless of structure, format, location, duplication, omissions or errors. Other examples are found in the products of IBM Entity Analytic Solutions (EAS), which aims to help organizations recognize the entities with which they are doing business. EAS is said to provide real time recognition and resolution, in context with existing business applications.
Although these systems provide certain benefits, Fair Isaac Corporation is interested in improving the performance and efficiency of fuzzy matching programs, since various Fair Isaac products do (or could) beneficially employ fuzzy matching. Fair Isaac has identified some areas of possible focus and some potential shortcomings of existing technology. For one, the computational complexity and cost associated with a brute-force, field by field fuzzy matching against each individual record in a reference database (e.g., a fraud file) is prohibitive in practice. Second, existing approaches can give misleading results when strong matches occur on weak data (such as the strong or identical match of a common first name such as “John”). Third, better control over the manner of fuzzy matching is desired. Fourth, the existing approaches are not as modular and easily extensible as some might like.
In view of these concerns, the existing fuzzy matching products are not completely adequate for all intended applications.