Legacy computer implemented data processing systems and methods generally required strict matching of equivalent data values. For example, an accounting application includes a database wherein account receivables are logged. Each record may comprise a plurality of fields, such as the name of the enterprise, the date billed, the amount due, the date paid and the amount of payment. A given payment may be received from Widget Inc. in the amount of $500. A given record in the accounting system may contain a record for Acme Widget Co. with an amount due of $723. In early legacy data processing systems and methods, if the data did not exactly match, entry of payment could not be automated. Accordingly an individual would be needed to match the data associated with the received payment to the appropriate record in the accounting system, and thereafter update the record.
As methods of data processing progress, it is desired that the systems and methods are more tolerant of variations between equivalent data values. Accordingly, it is desired that data processing methods and systems are capable of determining a similarity between two strings (e.g., fuzzy string matching). One method of determining a similarity between two strings comprises the Levenshtein distance (LEV) heuristic. The Levenshtein heuristic produces a matrix of hamming distances, which provides a measure of the similarity of the two strings. Another method of determining a similarity between two strings comprises the largest common substring (LCS) heuristic. Accordingly, recent data processing systems and methods, which utilize such a heuristic, can provide some ability to match strings. For example, the payment received from Widget Inc. may be matched to the accounts receivable record for Acme Widget Co. Therefore, some automation of data entry, processing and reporting can be achieved in conventional art data processing systems and methods.
Given two strings of length M and N, respectively, the Levenshtein heuristic is calculated in M times N (M×N) calculations. Similarly, the largest common substring heuristic is calculated in M time N (M×N) calculations. Accordingly, legacy string matching heuristics incur significant processing costs. Thus, string matching heuristics which provide increased string matching capabilities while minimizing computational costs are sought in the data processing arts.