§1.1 Field of the Invention
The present invention concerns matching an arbitrary-length bit string with one of a number of known arbitrary length bit strings. The present invention may be used for network intrusion detection and prevention. In particular, the present invention concerns a novel data structure—namely, a trie-bitmap content analyzer (TriBiCa) operating with a highly memory-efficient LogLog hashing method—which provides minimum perfect hashing functionality while supporting low-cost set membership queries. By using such a data structure, matching can be checked at high speed.
§1.2 Background Information
To ensure reliable and secure services, network security has grown increasingly important in the Internet. Deep Packet Inspection (“DPI”) has been widely used in the Network Intrusion Detection and Prevention Systems (“NIDPSs”) to detect viruses or worms. (See, e.g., Sourcefire 3d. [Online]. Available: http://www.sourcefire.com, Fortinet. [Online]. Available: http://www.xilinx.com.) The DPI examines every single byte of each incoming packet and matches them against a set of predefined malicious patterns. To implement the DPI at 40 Gbit/s or even 100 Gbit/s cost effectively and scalable to a few tens of thousands or millions of keys is still very challenging. To achieve the above objective, it is presently required to include all the patterns on the chip so as to take advantage of parallelism/pipelining operations.
Minimal Perfect Hash Functions (“MPHFs”) has been used, during the query, to access the pattern (also referred to as signature) in the hash table to compare with the incoming packet. The MPHF guarantees that there will be only one signature stored at each hashed location so that it just needs to perform one exact match operation (in other words, there is no hash collision). In addition, it also achieves the minimum hash table size by equating the table size to the number of keys. (See, e.g., P. E. Black, “Minimal Perfect Hashing,” in Dictionary of Algorithms and Data Structures. U.S. National Institute of Standards and Technology, July 2006. [Online]. Available: http://www.nist.gov/dads/HTML/minimalPerfectHash.html.)
In the paper N. S. Artan and H. J. Chao, “TriBiCa: Trie Bitmap Content Analyzer for High-Speed Network Intrusion Detection,” in 26th Annual IEEE Conference on Computer Communications (INFOCOM), 2007, pp. 125-133, an on-chip trie-based framework called TriBiCa (Trie Bitmap Content Analyzer) is proposed to implement the MPHF on a field-programmable-gate-array (FPGA) chip at a speed of 10 Gbit/s (See, e.g., N. S. Artan, R. Ghosh, Y. Guo, and H. J. Chao, “A 10-Gbps High-Speed Single-Chip Network Intrusion Detection and Prevention System,” in 50th Annual IEEE Global Communications Conference, GLOBECOM2007, Washington, D.C., November 2007.)
§1.2.1 Previous Approaches and Perceived Limitations of Such Approaches
For DPI in NIDPS, the data structure to store the intrusion signatures database should balance the requirements of high-speed, low-cost and easy update. DPI approaches in software NIDPSs such as Snort (See [Online]. Available: http://www.snort.org.) and the article V. Paxson, “Bro: A System for Detecting Network Intruders in Real-Time,” Computer Networks, vol. 31, pp. 2435-2463, 1999.) are very flexible and support detection of sophisticated intrusions. However, they are not scalable for high speeds since they run on general-purpose hardware, which is intrinsically slow and has limited parallelism. Hence, hardware approaches are preferred for certain applications.
DPI approaches on hardware can broadly be classified into two architectures based on their signature storage media: (1) off-chip memory (See, e.g., F. Yu, T. Lakshman, and R. Katz, “Gigabit Rate Pattern-Matching using TCAM,” in Int. Conf. on Network Protocols (ICNP), Berlin, Germany, October 2004 and H. Song and J. Lockwood, “Multi-pattern Signature Matching for Hardware Network Intrusion Detection Systems,” in 48th Annual IEEE Global Communications Conference, GLOBECOM2005, St. Louis, Mo., November-December 2005.) and (2) on-chip memory and/or logic blocks (See, e.g., C. Clark and D. Schimmel, “Scalable Pattern Matching for High-Speed Networks,” in IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), Napa, Calif., 2004, pp. 249-257, Y. H. Cho and W. H. Mangione-Smith, “Fast Reconfiguring Deep Packet Filter for 1+ Gigabit Network,” in FCCM, 2005, pp. 215-224, Z. K. Baker and V. K. Prasanna, “High-Throughput Linked-Pattern Matching for Intrusion Detection Systems,” in Proc. of the First Annual ACM Symposium on Architectures for Networking and Communications Systems, Princeton, N.J., 2005, pp. 193-202, J. Moscola, J. Lockwood, R. P. Loui, and M. Pachos, “Implementation of a Content-Scanning Module for an Internet Firewall,” in FCCM, 2003, pp. 31-38, I., Sourdis, D. Pnevmatikatos, S. Wong, and S. Vassiliadis, “A Reconfigurable Perfect-Hashing Scheme for Packet Inspection,” in Proc. 15th International Conference on Field Programmable Logic and Applications (FPL 2005), August 2005, pp. 644-647, L. Tan and T. Sherwood, “Architectures for Bit-Split String Scanning in Intrusion Detection,” IEEE Micro, vol. 26, no. 1, pp. 110-117, January-February 2006, G. Papadopoulos and D. N. Pnevmatikatos, “Hashing+Memory=Low Cost, Exact Pattern Matching,” in Proc. 15th International Conference on Field Programmable Logic and Applications (FPL), August 2005, pp. 39-44, and Y. Lu, B. Prabhakar, and F. Bonomi, “Perfect Hashing for Network Applications,” in IEEE Symposium on Information Theory, Seattle, Wash., 2006, pp. 2774-2778). Architectures using off-chip memory for signature storage are fundamentally limited by the off-chip memory throughput and additional cost of memory chips. As a result of these limitations of the off-chip storage, on-chip storage has gained attention.
For the partitioning operation, BARTS (See, e.g., J. van Lunteren, “Searching Very Large Routing Tables in Wide Embedded Memory,” Global Telecommunications Conference, 2001.GLOBECOM '01. IEEE, vol. 3, 2001.), a memory-efficient route lookup scheme, chooses a particular bit as a representative bit. However, BARTS selects those bits from the input keys, so uniformity of the input is critical for BARTS to succeed. Thus, it would be useful to provide a partitioning scheme that avoids the uniformity requirement. The Perfect Hash Function proposed by Sourdis et al. (See, e.g., I. Sourdis, D. Pnevmatikatos, S. Wong, and S. Vassiliadis, “A Reconfigurable Perfect-Hashing Scheme for Packet Inspection,” in Proc. 15th International Conference on Field Programmable Logic and Applications (FPL 2005), August 2005, pp. 644-647, I., and Sourdis, D. N. Pnevmatikatos, and S. Vassiliadis, “Scalable multigigabit pattern matching for packet inspection.” IEEE Trans. VLSI Syst., vol. 16, no. 2, pp. 156-166, 2008.) also uses inputs as hash keys and the hash function is hard-coded in the logic. The paper Y. Lu, B. Prabhakar, and F. Bonomi, “Perfect Hashing for Network Applications,” in IEEE Symposium on Information Theory), Seattle, Wash., 2006, pp. 2774-2778, achieved MPHF using Bloom Filters with 8.6n-bits. However, complex computations are required to locate entries in the hash table for queries.
A goal of the present application is to provide a low-cost and space-efficient MPHF that is simple to construct and suitable for high-speed hardware implementation.