§1.1 Field of the Invention
The present invention concerns matching an arbitrary-length bit string with one of a number of known arbitrary length bit strings. Embodiments consistent with the present invention may be used for network intrusion detection and prevention. In particular, some embodiments consistent with the present invention concern generating a data structure which provides perfect hashing functionality. By using such a data structure, string matching can be checked at high speed. At least some embodiments consistent with the present invention concern updating hash tables to include new rules.
§1.2 Background Information
Network intrusion detection systems (“NIDS”) have been widely deployed in today's Internet to safeguard the security of network operations. Among the many network-based intrusion detection techniques (See, e.g., the references: L. Feinstein, D. Schnackenberg, R. Balupari, and D. Kindred, “Statistical Approaches to Ddos Attack Detection and Response,” DISCEX (2003); L. Spitzner, Honeypots: Tracking Attackers, Addison-Wesley (2002); M. Becchi and P. Crowley, “Efficient Regular Expression Evaluation: Theory to Practice,” Proceedings of the 2008 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS) (San Jose, Calif., November 2008); and F. Yu, “High Speed Deep Packet Inspection with Hardware Support,” PhD dissertation of University of California at Berkeley (Berkeley, Calif., 2006), each incorporated herein by reference.), multi-string matching is commonly used because of its precision and accuracy in attack detection.
Many multi-string matching schemes have been proposed. (See, e.g., the references: S. Wu and U. Manber, “A Fast Algorithm for Multi-Pattern Searching,” Technical Report T-94-17, Department of Computer Science, University of Arizona (1994); S. Dharmapurikar and J. W. Lockwood, “Fast and Scalable Pattern Matching for Network Intrusion Detection Systems,” IEEE Journal of Selected Areas in Communications, Vol. 24, No. 10 (2006); H. Lu, K. Zheng, B. Liu, X. Zhang, and Y. Liu, “A Memory-Efficient Parallel String Matching Architecture for High-Speed Intrusion Detection,” IEEE Journal of Selected Areas in Communications, Vol. 24, No. 10 (2006); N. Hua, H. Song, T. V. Lakshman, “Variable-Stride Multi-Pattern Matching For Scalable Deep Packet Inspection”, IEEE INFOCOM (2009); J. van Lunteren, “High-Performance Pattern-Matching for Intrusion Detection,” IEEE INFOCOM (2006); and N. Tuck, T. Sherwood, B. Calder, and G. Varghese, “Deterministic Memory-Efficient String Matching Algorithms for Intrusion Detection,” IEEE INFOCOM (2004), each incorporated herein by reference.) Most of these proposed schemes are derived from the classic Aho-Corasick (“AC”) automaton (See, e.g., the reference A. V. Aho and M. J. Corasick, “Efficient String Matching: An Aid To Bibliographic Search,” Communications of the ACM, Vol. 18, No. 6, pp. 333-340 (1975), incorporated herein by reference.) This is because AC's worst case performance is deterministic, linear to the length of the input stream and independent of the rule (e.g., one of the bit strings against which an input is checked for match) set size. Therefore, an attacker cannot construct worst-case traffic that can slow down the NIDS and let malicious traffic escape the inspection. In fact, many popular NIDS and anti-virus systems, such as Snort (See, e.g., A free lightweight network intrusion detection system for UNIX and Windows, available online at http://www.snort.org, incorporated herein by reference.) and ClamAV (See, e.g., ClamAV, available online at http://www.clamay.net, incorporated herein by reference.) for example, have already implemented an AC automaton as their multi-string matching engines.
The AC automaton is introduced in §1.2.1 below. Then, multi-string matching schemes using the AC automaton (and their perceived limitations) are introduced in §1.2.2. Thereafter, other multi-string matching schemes are introduced in §1.2.3. Finally, desired characteristics of a multi-string matching scheme are discussed in §1.2.4.
§1.2.1 Aho-Corasick Automaton
The Aho-Corasick (AC) automaton is one of the most widely used algorithms in multi-string matching. As noted above, given its well-known deterministic matching throughput, it is not vulnerable to attack traffic. Provided with a set of string patterns (also referred to as “rules”), the construction of an AC automaton includes two steps. In the first step, a trie structure is created based on the set of string patterns (rules). Each state (represented as a node) on the trie corresponds to a valid prefix (matching a part) of the string patterns. The edges on the trie are called “goto transitions” of the AC automaton. In the second step, “failure transitions” are added from each state s to a state d if the prefix represented by state d is the longest suffix of the prefix represented by state s.
Consider, for example, a set of string patterns (that is, a rule set) {hers, he, his, him, me, she}. FIG. 1 illustrates an AC automaton for rule set {hers, he, his, him, me, she}, in which the solid arrows represent the goto transitions, and the dotted arrows represent the failure transitions. For simplicity, failure transitions to the root state are not shown.
Given an active state s and an input character c, the AC automaton will first check to determine if there is a goto transition from state s labeled with input character c. If such a goto transition exists, the state pointed by the goto transition will be the next active state (e.g., the active state in the next time slot); otherwise, the next active state (e.g., the active state in the next time slot) will be the state pointed by the failure transition of state s and then input character c will be examined again in the next time slot.
The AC automaton just introduced above is an un-optimized version. There is a second, optimized version. An advantage of the un-optimized version is that an AC automaton with N states has only N−1 goto transitions and N−1 failure transitions. Consequently, the storage complexity of transitions is relatively low. For an input stream with length L, the number of state transitions to be made during matching in the worst cast is 2L.
The optimized version of an AC automaton is referred to as a “Deterministic Finite Automaton” (DFA). An optimized version of an AC automaton may be constructed based on the un-optimized version by (1) adding goto transitions for every character from every state and (2) removing the failure transitions. Compared to the un-optimized version, the optimized version only needs to make one state transition for each input character. Therefore, its worst-case throughput is twice that of the un-optimized version. Unfortunately, however, the optimized version has a huge memory cost, since each state has 256 goto transitions corresponding to 256 (ASCII) characters.
In the following, unless specifically noted, the term “AC automaton” will denote its un-optimized version, while the term “AC-DFA” will denote the optimized version. For simplicity, the word “transition” is used to refer a goto transition (as opposed to a failure transition) unless it is clear from the context that a failure transition is intended.
§1.2.2 Multi-String Matching Schemes Using the AC Automoton
With rule sets continuing to grow quickly, implementing an AC automaton with a small memory without sacrificing performance becomes a major challenge in NIDS design. There are many schemes that could be used to efficiently implement dense automatons. (An automaton may be referred to as a “dense automaton” if the ratio of its total transition number to its total state number is close to 256.) A two-dimensional direct-indexed table may be used to store all the transitions, where each row corresponds to a state, each column corresponds to a symbol, and the intersection between each row and each column stores a row ID of the next hop state.
In order to reduce memory cost, HEXA (See, e.g., the reference S. Kumar, J. Turner, P. Crowley, and M. Mitzenmacher, “HEXA: Compact Data Structures for Faster Packet Processing,” Proceedings of the Fifteenth IEEE International Conference on Network Protocols (ICNP), pp. 246-255 (2007), incorporated herein by reference.) was proposed to reduce the number of bits stored in each field of the two-dimensional table using the historical scanning information carried by the input stream. Although a two-dimensional table works fine for a dense automaton, it is not a good solution to implement a sparse automaton (such as AC automaton, which has the transition-to-state ratio normally between 1 and 2), because of the memory wasted by the non-existing transitions.
Besides the two-dimensional table, an automaton may be implemented by storing each state as a whole data structure, and connecting parent and child states by pointers in the parent states. However, the wide distribution of state sizes (i.e., the numbers of transitions of states) on the AC automaton makes the design of a compact state structure challenging.
FIG. 2 illustrates the distribution of state sizes on the AC automaton based on the Snort rule set. Notice that the distribution is quite wide and unbalanced, with most states having smaller sizes. Consequently, it is challenging to design a compact state structure storing pointers pointing to the child states.
Using a hash table to implement the sparse automaton (such as AC automaton, for example) is advantageous because non-existing transitions needn't be stored, and the complicated state structure needn't be kept. Compared to other AC automaton implementation schemes, such as bitmap-compression AC and path-compression AC (Recall, e.g., the article N. Tuck, T. Sherwood, B. Calder, and G. Varghese, “Deterministic Memory-Efficient String Matching Algorithms for Intrusion Detection,” IEEE INFOCOM (2004)) for example, storing transitions directly in a hash table can avoid unnecessary memory waste, and simplify the process of making a transition decision.
The main challenge involved in hash table design is avoiding hash collisions. Hash collisions might increase memory access times for each transition decision and cause instability of the processing speed. Furthermore, hash collisions might be exploited by attackers to degrade system performance. The paper, J. van Lunteren, “High-Performance Pattern-Matching for Intrusion Detection,” IEEE INFOCOM (2006) proposes a BFSM-based pattern-matching (“BFPM”) technique that uses a hash table construction scheme named Balanced Routing Table (“BART”) (See, e.g., the reference J. van Lunteren and A. P. J. Engbersen, “Fast and Scalable Packet Classification,” IEEE Journal of Selected Areas in Communications, vol. 21, no. 4, pp. 560-571, May (2003), incorporated herein by reference.) to limit the maximum number of collisions of any hash index by a configurable bound P. (P=4 is used in the reference J. van Lunteren, “High-Performance Pattern-Matching for Intrusion Detection,” IEEE INFOCOM (2006).) When a transition decision is made, P transitions are read out from the same entry of the hash table simultaneously. After P parallel comparisons, the correct transition can be decided. Unfortunately, however, storing multiple transitions in each entry increases the memory bus width and wastes memory space. Furthermore, P comparisons required for each transition decrease the scheme's efficiency in software implementation.
Therefore, an efficient perfect hashing scheme for generating a sparse automaton (such as an AC automaton for example) is desirable in high-performance NIDS design. Although there are many perfect hashing and alternative algorithms available in literature, most of them require multiple memory accesses to generate the hash index (traversing a tree structure) (See, e.g., the references: N. S. Artan and H. J. Chao, “Tribica: Trie Bitmap Content Analyzer for High-Speed Network Intrusion Detection,” IEEE INFOCOM (2007); and N. S. Artan, M. Bando, and H. J. Chao, “Boundary Hash for Memory-Efficient Deep Packet Inspection,” IEEE International Conference on Communications (ICC 2008) (Beijing, PRC, May 19-23, 2008), each incorporated herein by reference.), or need more than one memory access in the worst case to get the correct hash index for a hash table lookup (See, e.g., the references: R. Pagh and F. F. Rodler, “Cuckoo Hashing,” ESA (2001), S. Kumar, J. Turner, and P. Crowley, “Peacock Hashing: Deterministic and Updatable Hashing for High Performance Networking,” IEEE INFOCOM (2008); and F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese, “Beyond Bloom Filters: From Approximate Membership Checks to Approximate State Machines,” ACM SIGCOMM (2006), each incorporated herein by reference.) Due to the dependency between two contiguous transitions made on the automaton (without the new current state information, the next transition cannot be made), one hash query can start only after the previous hash query returns a new current state ID. That is, hash queries are performed in series. The time required to perform one hash query is equal to the sum of (1) the time for generating the hash index (i.e., the result of the hash calculation(s)) and (2) the time accessing the hash table. (Given a hash key to be searched in the hash table, we need to first use the hash key as the input parameter to do a hash calculation (using a hash function, for example), and the result of the hash calculation (so called hash index) is the location of the hash table storing the desired hash key. Normally, the hash function is pre-determined.) Therefore, if the hash unit takes too much time generating the hash index or accessing the hash table, the matching speed of the system will be degraded.
§1.2.2.1 Memory Optimization of Aho-Corasick Automaton
Many techniques seeking to reduce the memory cost of AC automaton and AC-DFA have been proposed in literature. (See, e.g., the references: J. van Lunteren, “High-Performance Pattern-Matching for Intrusion Detection,” IEEE INFOCOM (2006); N. Tuck, T. Sherwood, B. Calder, and G. Varghese, “Deterministic Memory-Efficient String Matching Algorithms for Intrusion Detection,” IEEE INFOCOM (2004); T. Song, W. Zhang, D. Wang, and Y. Xue, “A Memory Efficient Multiple Pattern Matching Architecture for Network Security,” IEEE INFOCOM (2008); and L. Tan, T. Sherwood, “A High Throughput String Matching Architecture for Intrusion Detection and Prevention,” 32nd Annual International Symposium on Computer Architecture, ISCA (2005) each incorporated herein by reference.) In the paper Tuck et al. (N. Tuck, T. Sherwood, B. Calder, and G. Varghese, “Deterministic Memory-Efficient String Matching Algorithms for Intrusion Detection,” IEEE INFOCOM (2004), incorporated herein by reference.), bitmap compression and path compression are applied to an AC automaton to avoid storing non-existing transitions, thereby reducing memory costs. The paper Tan et al. (L. Tan, T. Sherwood, “A High Throughput String Matching Architecture for Intrusion Detection and Prevention,” 32nd Annual International Symposium on Computer Architecture, ISCA (2005), incorporated herein by reference) proposes an approach which bit-splits an AC-DFA into several small AC-DFAs, thereby reducing the total memory requirement. The papers Song et al. (T. Song, W. Zhang, D. Wang, and Y. Xue, “A Memory Efficient Multiple Pattern Matching Architecture for Network Security,” IEEE INFOCOM (2008)) and Lunteren (J. van Lunteren, “High-Performance Pattern-Matching for Intrusion Detection,” IEEE INFOCOM (2006)) noted that a large fraction of transitions on an AC-DFA are backward to states at the first three levels (the root state is at level 1). Based on this observation, the Lunteren paper proposes removing transitions backward to the first two levels by storing them in a separate 256-entry table. The Song paper (T. Song, W. Zhang, D. Wang, and Y. Xue, “A Memory Efficient Multiple Pattern Matching Architecture for Network Security,” IEEE INFOCOM (2008) proposes a Cached Deterministic Finite Automate (“CDFA”) model, based on which backward transitions to states at level 3 can also be removed. The main idea of CDFA is to maintain more than one active state in AC-DFA (one at the root state, one at states at level 2, and one at states at other levels). It has been shown that after eliminating backward transitions to states at the first three levels, the number of transitions of an AC-DFA is approximately equal to the number of transitions of an AC automaton. Furthermore, it is observed that the total number of transitions could be significantly reduced if the rule set is partitioned into multiple subsets, and implemented by multiple small AC-DFAs. (See, e.g., the articles: J. van Lunteren, “High-Performance Pattern-Matching for Intrusion Detection,” IEEE INFOCOM (2006); and T. Song, W. Zhang, D. Wang, and Y. Xue, “A Memory Efficient Multiple Pattern Matching Architecture for Network Security,” IEEE INFOCOM (2008).)
Besides the memory optimization, other research work focuses on accelerating the processing speed of AC automaton/AC-DFA. (Recall, e.g., the articles: S. Dharmapurikar and J. W. Lockwood, “Fast and Scalable Pattern Matching for Network Intrusion Detection Systems,” IEEE Journal of Selected Areas in Communications, Vol. 24, No. 10 (2006); H. Lu, K. Zheng, B. Liu, X. Zhang, and Y. Liu, “A Memory-Efficient Parallel String Matching Architecture for High-Speed Intrusion Detection,” IEEE Journal of Selected Areas in Communications, Vol. 24, No. 10 (2006); and N. Hua, H. Song, T. V. Lakshman, “Variable-Stride Multi-Pattern Matching For Scalable Deep Packet Inspection”, IEEE INFOCOM (2009).)
§1.2.3 Other Multi-String Matching Schemes
Researchers have proposed multi-string matching schemes that don't rely on an AC automaton or an AC-DFA. For example, the paper Yu et al. (F. Yu, R. H. Katz, and T. V. Lakshman, “Gigabit Rate Packet Pattern-Matching Using TCAM,” Proceedings of the Fifteenth IEEE International Conference on Network Protocols (ICNP) (2004), incorporated herein by reference.) proposes a gigabit rate multistring matching scheme based on a Ternary Content-Addressable Memory (“TCAM”). The paper Piyachon and Luo (P. Piyachon and Y. Luo, “Efficient Memory Utilization On Network Processors for Deep Packet Inspection,” Symposium on Architecture for Networking and Communications Systems (ANCS) (2006), incorporated herein by reference.) proposes a sophisticated memory model for multi-string matching implementation based on Network Processors (“NPs”). In addition, there are many field programmable gate array (“FPGA”) based schemes proposed for multi-string matching (See, e.g., the references: Z. K. Baker, V. K. Prasanna, “High-Throughput Linked-Pattern Matching for Intrusion Detection Systems,” Symposium on Architecture for Networking and Communications Systems (ANCS) (October 2005); I. Sourdis, D. N. Pnevmatikatos, and S. Vassiliadis, “Scalable Multigigabit Pattern Matching for Packet Inspection,” IEEE Trans. VLSI Syst., Vol. 16, No. 2, pp. 156-166 (2008); and Y.-H. E. Yang and V. K. Prasanna, “Memory-Efficient Pipelined Architecture for Large-Scale String Matching,” 17th Annual IEEE FCCM April (2009), each incorporated herein by reference.) which map the rule set directly to the pure logic of FPGAs, and can achieve high performance. One limitation of FPGA-based schemes is that when rules are changed, it takes considerable time to re-synthesize the design and reprogram the FPGA.
§1.2.4 Desired Characteristics of a Multi-String Matching Scheme
In view of the foregoing, there is a need to provide a multi-string matching algorithm which (1) avoids hash collisions (that is, is a perfect hash table), (2) uses memory efficiently, (3) requires no memory access to generate the hash index, and/or (4) guarantees to return the hash result within the time of exact one memory access.
Each of the foregoing articles (in this section 1.2) is incorporated herein by reference.