Field of the Invention
The present invention relates to methods and systems for detecting patterns in a data stream that match multi-pattern rules.
Background Art
The detection of a particular pattern in a data stream is used in many computing environments. For example, in fields such as virus detection, the data stream that is being received by a computer will need to be monitored for the presence of viruses. The virus checker will be able to recognize specific viruses and also viruses of generic types. The virus checker will have access to a data structure that includes a large number of different patterns, probably over a thousand in number. The patterns can comprise simple character sequences (strings) such as “password” or can be specified in a more flexible way, for example, using regular expressions that can include generic references to character classes and the number of occurrences of certain character and character sequences.
A data stream that is received by a computer, which needs to be analyzed, will be formed of a series of bytes, and in common protocols such as TCP/IP (used for Internet communication) these bytes will be received in the form of data packets. These data packets that form the data stream are scanned for the presence of the stored patterns as the stream is received. This scanning can be executed by software, or in some environments a dedicated ASIC or an FPGA can be used to carry out the pattern matching. If a pattern is detected, then an output signal is generated, and depending upon the application, action such as deleting the pattern from the data packet is then executed.
Existing/published approaches for pattern matching and recognition engines focus on the detection of individual patterns. Many applications for pattern matching engines, however, specify conditions at the level of multiple patterns. Typically such conditions are combined into a so called rule. One good example is intrusion detection: the Snort rules set (open source IDS system) specifies rules that employ conditions related to multiple patterns. For example, a rule can specify that if a pattern “pattern1” has been found, and if this pattern is followed in the next 100 bytes in the input stream by a second pattern “pattern2,” and if the next 350 bytes do NOT contain a pattern “pattern3”, then this should raise a certain alarm. Typically, the “pattern-level” conditions that are used in multi-pattern rules, relate to absolute and relative offsets within the input stream and relative to other patterns, as well as order of detection, and whether a pattern is detected or not (negation).
In conventional implementations, the above multi-pattern conditions are typically resolved in software executed at either a host processor or embedded processor. Given the increasing complexity of IDS rule sets and multi-pattern rules in particular, this is likely to become a performance bottleneck that even might become a security bottleneck, as it can be the target of Denial of Service (DOS) attacks. For example, a hacker may attempt to generate worst-case traffic that will trigger the evaluation of the most complex multi-pattern rules in order to bring the system (performance) down.