Typically, pattern matching involves the comparison of a large body of text, characters, etc. with a known string or pattern with a view to locating the string or pattern within the body of text, characters, etc. Pattern matching has many applications ranging from word processing to genomics and protein sequencing but has not yet been widely used in communications applications because of the difficulty of implementing an engine that could match complex patterns at very high speeds.
A known pattern matching solution makes use of a “Shift-Or” method which uses bitwise techniques. The Shift-Or method is described in “A New Approach To Text Searching”, by R. Baeza-Yates and G. H. Gonnet, Communications of the ACM 35(10), and is characterized by an intrinsic parallelism which makes it slow when executed on a general purpose processor (GPP) but that can be exploited when targeting a hardware implementation.
A variant of the Shift-Or method known as a Shift-And method can also be used for pattern matching implementations. A high level hardware implementation of an engine executing the Shift-And method is illustrated in FIG. 1. In this implementation the pattern RAM is filled with the string before running the engine according to the preprocessing part of the method. The preprocessing part of the method corresponds to the table R and is σ high and m-bits wide.
The input stream register receives the characters of the input text, usually bytes. The register uses the characters to address the pattern RAM. Then the results of the reading of the memory is fed to the automaton which is a simple shift/and combinatory logic with a register. All the components are clocked with the same clock h.
The Shift-Or and Shift-And methods have a relatively poor performance compared to other pattern matching methods. However, they are suitable for hardware implementations and can be well optimized.
In addition to the Shift-And method described above other solutions involve pattern matching engines using a tree-based approach. In this solution the pattern is preprocessed to create a huge tree with every incoming bit of the input text making the engine follow the branches of the tree. Although the solution is believed to be quite fast the memory requirements are huge and does not scale well. Another draw back to this solution is that the preprocessing time is significant making the solution unsuitable for fast changing patterns.
Pattern matching is a base building block for content-aware applications such as web (http) load balancing, application aware classification/billing, intrusion detection systems, etc. Accordingly, there is a need for a pattern matching engine capable of processing input streams at high speeds and that is scalable.