With increasing attacks on computer network security, content-based detection of intrusion and subsequent prevention continue to evolve. Network intrusion detection and prevention systems and appliances such as firewalls, switches, and the likes are becoming common in various computing environments. Researchers and engineers continue to develop new techniques, methodologies, and systems to search multiple patterns in software and hardware.
However, even with ongoing researches and technological advances, certain old challenges remain and new challenges emerge. For example, increasing network speeds seem to work against the requirement to remain line-speed. Moreover, as the number and variety of attacks continue to increase, so do the patterns to be searched and matched. Today, string matching remains computationally intensive and still requires storing a large number of state machines (finite automata) that are necessary to match patterns. Since fast memories are expensive and cheaper memories are slow, an undesirable contention between cost and performance seems unavoidable.
Several approaches have attempted to resolve this contention. An approach conserves memory by regenerating all state machines and starting over again any time there is a change in the rule-set. This is not desirable because it requires laborious reprogramming of logic.
Another approach conserves memory by limiting the number of rules (patterns*characters). With this approach, high performance is achieved for small sets of rules, but lacks scalability. In other words, increasing the size of the rule-sets will dramatically decrease the performance of the systems, if no compensatory measure is taken. As such, most researchers tend to use packet level parallelism to achieve higher bandwidth, i.e., multiple copies of the automata work on different packets at lower rate, see, e.g., Cho et al. “Specialized Hardware for Deep Network Packet Filtering” FPL 2002, LNCS 2438, pp. 452-461, 2002, and “Deep Packet Filter with Dedicated Logic and Read Only Memories,” 12th Annual IEEE Symposium on Field Programmable Custom Computing Machines 2004 (FCCM '04), pp.1-10.
Most modern systems adopt rules from an open source network intrusion detection system known as the Snort™ (Snort™ is a trademark of Sourcefire, Inc. of Columbia, Md., USA). Many researchers have proposed various ways to cover all or most of the Snort™ rules, available online from <http://www.snort.org>.
Most proposed designs need multiple FPGAs to cover the existing Snort rule set, see, e.g., Sourdis et al. “Fast, Large-Scale String Match for a 10 Gbps FPGA-based Network Intrusion Detection System,” Proceedings of the 13th International Conference on Field Programmable Logic and Applications (FPL2003), Sep. 1-3, 2003, Lisbon, Portugal. Sourdis et al. disclose that, for pattern matching, three FPGAs of 120,000 logic cells are needed to include the entire Snort collection, at which time contained less than 1500 patterns, with an average size of 12.6 characters. Four devices are needed to include the entire Snort rule set including header matching. These calculations do not include area optimizations.
Overall, most designs are dependent on either high on-chip bandwidth, which allows data to be shuttled to large matching units, or fairly high percentages of control routing and high logic complexity. Both of these characteristics go against scalability of systems.
The scalability is also affected by the underlying string matching algorithm. Most network intrusion detection and prevention systems employ or are based on the string searching algorithm by A. V. Aho and M. J. Corasick, “Efficient String Matching: An Aid to Bibliographic Search,” Communications of the ACM, 18(6):333-340, June 1975. The Aho-Corasick algorithm is a dictionary-based string searching algorithm that locates elements of a finite set of patterns (the dictionary) within an input text. Generally, the algorithm constructs a finite automaton first and then applies that automaton to the input text. When the pattern dictionary is known in advance (e.g., a computer virus database), the construction of the automaton can be performed once off-line and the compiled automaton stored for later use.
Aho-Corasick is known to have deterministic worst-case lookup times as well as data structure suitable for wire (line) speed hardware-based string matching. The classical Aho-Corasick data structure takes more storage space than typical available SRAM on a processing system.
For example, a state table with 16,384 active states will take up 256*16384*16 bits=8 Mbytes of memory. Depending on the type of packet, several of such state tables could be required in memory. If we were to implement this design on a field programmable gate array (FPGA), the maximum block RAM (BRAM) available in a known FPGA device such as the Xilinx Virtex-II Pro would be of the order of 738 Kbytes. At a line-rate of 2 Gbps (processing throughput), a byte needs to be processed every 4 ns. If we were to implement this on a 100 MHz logic, we would have a clock cycle of 10 ns. Therefore, 2.5 bytes would need to be processed every cycle to match the line-rate. In other words, we would need a minimum of three threads to handle the traffic at the specified line-rate from the processing perspective.
In addition, in the classical Aho-Corasick data structure, assuming that it takes 16 bits to store a state, 512 bytes would need to be allocated in the local memory for every incoming character. In a naive implementation for the worst case scenario, this means that a 2 Gbps traffic will lead to 1024 Gbps memory bandwidth, which is enormous. Under certain circumstances, it is possible to achieve 10 Gbps/channel sustained memory bandwidth with cost effective DRAMs. We can therefore easily see that the difference between what is achievable and what is required is still significant and substantial.
In addition to the fundamental limitations imposed by the Aho-Corasick algorithm and its data structure, another important performance-influencing factor relates to the very nature of finite automata. A finite automaton (state machine) processes an input string and either accepts or rejects it. The operation of finite automata is limited to one character per cycle operation. Both deterministic finite automata (DFA) and non-deterministic finite automata (NFA) are common techniques to perform hardware-based text search, see, e.g., Sidhu et al. “Fast Regular Expression Matching using FPGAs,” IEEE Symposium on Field-Programmable Custom Computing Machines (FCCMO1), April 2001, pp. 1-12. A DFA is derived or constructed from an NFA, which generally can be directly implemented with FPGA hardware. A NFA, however, is not suitable for serial, software implementation, hence the conversion to a DFA.
In the U.S. Published Patent Application No. US2003/0065800, Wyschogrod et al. disclose a DFA based approach that groups transitions into classes to conserve memory. According to Wyschogrod et al., a character class is defined as a set of characters that cause the same state transitions. A careful analysis of a typical intrusion string data in, for example, the Snort rule-sets, leads to the observation that very few character classes are generated because of the same state transitions, except those that are generated because of failure transitions leading to idle or initial state.
Today, challenges remain in improving efficiency, flexibility, cost-effectiveness, speed, and performance of network intrusion detection and prevention systems and appliances. Confronted daily by the ongoing, increasing attacks, there is a strong need and desire in the network security art for a viable and effective mechanism, system, and apparatus that can substantially reduce the amount of storage in memory and memory bandwidth, thereby significantly increasing the much needed speed and performance and improving the highly desired efficiency, flexibility, and cost-effectiveness. The present invention addresses this need.