1. Field of the Invention
This application generally relates to the processing of regular expressions.
2. Description of Related Technology
When data that is transmitted across a network, such as the Internet, arrives at a server, having survived all the routing, processing, and filtering that may have occurred in the network, it may be further processed. This further processing may occur all at once when the information arrives, as in the case of a web server. Alternatively, this processing may occur at stages, with a first one or more stages removing layers of protocol with one or more intermediate forms being stored on disk, for example. Later stages may also process the information when the original payload is retrieved, as with an e-mail server, for example. In such an information processing system, the need for high-speed processing becomes increasingly important due to the need to complete the processing in a network and also because of the volume of information that must be processed within a given time.
Regular expressions are well-known in the prior art and have been in use for some time for pattern matching and lexical analysis. An example of their use is disclosed by K. L. Thompson in U.S. Pat. No. 3,568,156, issued Mar. 2, 1971, which is hereby incorporated by reference in its entirety. U.S. patent application Ser. No. 10/851,482, filed on May 21, 2004, and entitled, “Regular Expression Acceleration Engine and Processing Model,” is also hereby incorporated by reference in its entirety.
Contemporary applications use thousands to tens of thousands of regular expressions to detect attacks. When compiled into one or more state machines, for example, those expressions consume a great deal of instruction memory. Input data streams, such as content that is delivered via one or more networks, may be scanned by a software state machine engine that traverses the state machines in order to determine if the input data streams contain characters matching the original regular expressions. As those of skill in the art will recognize, as the quantity of regular expressions to be detected in input data streams increases (e.g. the quantity of viruses that a network server wants to detect) and the size of strings to be detected also increases (e.g., virus signatures), the suitability of software state machine engines decreases.