Approximate pattern or string matching is a significant problem that arises in many important applications. These can include, but are not limited to, computational biology, databases and computer communications. This task includes searching for matches between the specified pattern or set of patterns while typically permitting a specified number of errors. As an example, one may desire to search for the word “queuing” while allowing for two errors. This could return results such as the word “queueing” with one character insertion and “cueing” with one character substitution and one character deletion. By allowing a specified number of errors, this allows the search to catch typical spelling variations or errors and still find the desired pattern. Approximate pattern matching is not only a complex task but requires a tremendous amount of computer resources.
Typically, there is a fast filtering step that is followed by the verification step that performs the full approximate matching function. An example of this prior art filtering technique is shown by referring to FIG. 1 and is generally indicated by numeral 10. This typical approach is to slice a pattern “P”, as indicated by numeral 12, into k+1 pattern pieces, which are a sequence of non-overlapping sub-patterns, and search for exact matches between the text and the pattern pieces. In this case, “k” is equal to the number of allowable errors, which is the maximum edit distance ed(Ti . . . j,P), which is indicated in this nonlimiting example by the numeral two (2) as indicated by numeral 14.
A data string Ti . . . j 16 is then analyzed for an occurrence of at least one substring of the data string 16 that matches at least one of the non-overlapping sub-patterns associated with pattern “P” 12. This approach relies on the following properties:                a. If string S=Ta . . . b matches pattern P with at most k errors, and P=p1 . . . pj (a sequence of non-overlapping sub-patterns), then some sub-string of S matches at least one of the pi's with at most └k/j┘ errors        b. If there are character positions i≦j such that ed(Ti . . . j,P)≦k, then Tj−m+1 . . . j includes at least m-k characters of P where m is the size of the pattern (in characters)        c. Therefore, if we slice P into k+1 pieces (non-overlapping sub-patterns), then at least one of the pieces must match exactly        
Therefore, if we slice “P” 12 by the total number of errors “k” 14 plus one (1) into non-overlapping sub-pattern pieces then at least one of the non-overlapping sub-pattern pieces must match exactly. As shown in the Example of FIG. 1, the data string Ti . . . j 16 is divided into k+1 or three (3) pieces of non-overlapping sub-patterns. Therefore the three (3) pieces are “abra” indicated by numeral 18, “cada” indicated by numeral 20, and “bra” indicated by numeral 22. In this example, “cada” indicated by numeral 20 is an exact match with two errors where the letters “br” are replaced and the letter “b” is deleted.
There is a significant need for a fast and cost effective mechanism for pattern matching utilizing a substantial amount of input data with a considerable set of potentially matching patterns.