String similarity analysis or “string matching” generally relates to determining the degree of similarity between two strings of data. String matching may be used in a variety of applications including data quality, searching, clustering, or other approaches to data analysis. In a simple example, two strings may differ in ways such as the perturbation of a single character. For instance, the strings “Mississippi” and “Mississippe” differ with respect to a single character. However, differences between related strings may be much more complicated.
A number of different approaches to characterizing or quantifying the relative similarity between strings have been proposed. However, many such approaches require subjective tuning of various parameters by a human user to tailor an approach to a given application or context. Examples of prior approaches proposed for performing string matching analysis may include deterministic matching and fuzzy matching. Deterministic matching involves a cascading sequence of match scenarios where when a match occurs, the sequence stops. Fuzzy matching algorithms attempt to match two strings typically through the use of a cost or distance function that is minimized. Each of these approaches has disadvantages. Thus, there is a continued need for string matching analysis approaches.