In many applications it is necessary to find patterns within a text. Exemplary applications requiring searching patterns in a text include text mining, DNA sequence searching and monitoring data packets passing over a network. Many methods have been suggested for searching patterns in a text in order to achieve as fast a search as possible.
In pattern searching, a text is searched for a pattern formed of a sequence of characters. Each character of the text and the pattern may have any value from an alphabet of the text.
U.S. Pat. Nos. 6,169,969 and 6,311,183 to Cohen, the disclosures of which documents are incorporated herein by reference, describe methods of finding patterns in which the search for the pattern is performed in a plurality of stages, beginning for example with a hash function. Other search methods, are described in U.S. Pat. No. 6,269,189 to Chanod and U.S. Pat. No. 5,497,488 to Akizama, et al., the disclosures of which are incorporated herein by reference.
In a method known as “shift-And”, a bit-mask table is generated for the searched pattern. The table includes, for each character of the alphabet, a bit-mask which has ‘1’s in positions in which the character appears in the pattern. During the search process, for each character of the text, the bit-mask of the character is manipulated using logical bit operations.
The article “Fast and Flexible String Matching by Combining Bit-parallelism and Suffix Automata” by G. Navarro and M. Raffinot, the disclosure of which is incorporated herein by reference, describes a “bit parallelism on suffix automata” method which is a variation of the “shift-And” method. This method progresses over the text in jumps over segments allowing, on the average, a relatively fast operation. For some patterns, however, such as “ababababc” or “aaaaaaaab” the method may need to continuously retract in order to systematically find all the appearances of the pattern in the text.
An improvement of this method for a plurality of patterns is performed by concatenating a plurality of patterns to be searched in the string. This improvement, however, is limited in the number of patterns that can be searched for concurrently.