1. Technical Field
The present disclosure relates generally to the acceleration of data decompression, and more particularly to the acceleration of dictionary-based data decompression in a networked environment.
2. Discussion of Related Art
Websites are increasingly more complex and rely on dynamic content and dynamic scripting languages to enhance the user experience. Such complex websites use a large amount of the available bandwidth and benefit from the use of data compression technologies. As an example, HTTP/1.1 incorporates HTTP message body compression by means of content encoding.
One conventional compression/decompression algorithm, Deflate, implements the Lempel-Ziv (LZ) compression algorithm. The LZ compression algorithm identifies substrings in input streams that have occurred in the past and replaces each substring with a reference distance and a length. The resulting compressed stream includes a sequence of literal requests and copy requests, which are referred to as tokens. The tokens may be further compressed using Huffman encoding. Deflate is used in conventional compression programs such as gzip and zip. The reference distances may be limited to a predetermined size (e.g., 32K bytes) for efficiency reasons.
The decompression is achieved in reverse order by performing Huffman decoding followed by Lempel-Ziv decompression. In LZ decompression, an input pointer processes each record of the compressed stream. For a literal request, the literal is copied from the input buffer to an output buffer (e.g., a history buffer). For a copy request, a string is copied from the existing history (e.g., a data-dictionary) to the end of the output buffer.
FIG. 1 illustrates a conventional implementation of the Deflate decompression algorithm. Variable length tokens are extracted from a compressed input stream and placed in an input buffer 101. The variable length tokens can then be decoded using a variable length token decoder 105 and a Huffman code table 106 to generate fixed length tokens for storage in a fixed length token buffer 104. The fixed length tokens can then be presented to a LZ decompression unit 105 that either copies a literal from the token or uses a reference distance of the token to copy data from the recent history of the output buffer 102.
Decompression algorithms like LZ decompression are being increasingly implemented in hardware, such as in individual chip cores, as decompression engines. However, since the amount of available chip area is limited, typically only a small number of decompression engines can be accommodated.
In stateless decompression, the inputs of different decompression requests can be independently processed. In conventional stateless decompression, all segments of a compressed data stream generated by a single compression operation must be received and presented at once to an acceleration engine. The magnitude of open connections that might carry compressed state can be in the thousands for enterprise servers. For systems like intrusion prevention systems the number of routed open connections can be in the order of millions. The requirement to have the entire compressed data stream available can create significant memory pressure on the system, which could be exploited in a service attack. Alternatively, a decompression engine can be dedicated to a particular stream from the first to the last packet of a compressed stream and thus packets could be decompressed one after the other. However, due to network traffic delays, packets belonging to a compressed stream might arrive in a sequence of bursts due to the network protocol (e.g. TCP/IP) and they typically span multiple seconds or longer. Hence the number of concurrent connections that can be handled at a time is limited to the number of decompression engines. In addition, this method can be exploited by attacks that do not send all of the packets.
In some intrusion prevention systems, decompressed content must be inspected on a per-packet level to detect intrusions as early as possible so the packets can be either rejected or forwarded based on that analysis. Given the large number of connections that might require simultaneous decompression (e.g., 1 million or more), coupled with the necessity to decompress data streams on a per-packet basis, the system responsible for the decompression needs to be able perform efficient stateful decompression. In stateful decompression, different decompression operations are allowed to share the same state (e.g., the same data-dictionary). For example, in a packet deep inspection system, while the compression at the sender side is done at a flow (e.g., a stream) level, the inspection is done at a packet level where packets for the multitude of flows arrive interspersedly. However, the high concurrency of network traffic extends to the decompression accelerator and forces the decompression accelerator to maintain the state of each compressed flow.
Thus, there is a need for methods and systems for decompressing a stream of compressed data packets that can minimize the overhead of moving the decompression state (e.g., all or part of a data-dictionary) between local and remote memory spaces.