A wide variety of methods for data compression are known in the art. Many Web servers, for example, use the GZIP algorithm to compress Hypertext Transfer Protocol (HTTP) symbol streams that they transmit. GZIP is defined in Request for Comments (RFC) 1951 of the Internet Engineering Task Force (IETF), by Deutsch, entitled, “Deflate Compressed Data Format Specification” (1996), which is incorporated herein by reference. GZIP initially compresses the symbol stream using the LZ77 algorithm, as defined by Ziv and Lempel in “A Universal Algorithm for Sequential Data Compression,” IEEE Transactions on Information Theory (1977), pages 337-343, which is incorporated herein by reference. LZ77 operates generally by replacing recurring strings of symbols with pointers to previous data within a 32 KB window. As the next stage in GZIP, the output of the LZ77 compression operation is further compressed by Huffman encoding, as is known in the art. The compressed HTTP stream is decompressed at the destination by Huffman decoding followed by LZ77 decompression.
Pattern matching algorithms are widely used in a variety of network communication applications. For example, Intrusion Detection Systems (IDS) use pattern matching in deep packet inspection (DPI). The packet content is typically checked against multiple patterns simultaneously for purposes such as detecting known signatures of malicious content.
The most common approach used at present in this type of multi-pattern matching is the Aho-Corasick algorithm, which was first described by Aho and Corasick in “Efficient String Matching: An Aid to Bibliographic Search,” Communications of the ACM 6, pages 333-340 (1975), which is incorporated herein by reference. (The term “multi-pattern matching,” as used in the context of the present patent application and in the claims, refers to scanning a sequence of symbols for multiple patterns simultaneously in a single process.) Efficient methods for multi-pattern matching in compressed data are described, for example, in U.S. Patent Application Publication 2011/0185077, whose disclosure is incorporated herein by reference.