In data compression relying on compression dictionaries, a block of data composing binary data to compress is compressed using one or more references to entries of the compression dictionary.
A conventional approach to finding such references is to search for the same series of bits or bytes among the entries and subparts of the block of data. Then, in the compressed data, subparts are substituted with corresponding references to entries.
This is the case for compression techniques such as the DEFLATE algorithm, the Lempel-Ziv-Welch (LZW) algorithm and the Lempel-Ziv-Markov chain-Algorithm (LZMA) which are based on a sliding window, and also such as the bzip2 algorithm.
The sliding window generally defines the number N of last bytes that have been processed (for compression) and that constitute the compression dictionary from which back-references are searched for.
To enable reciprocal decompression, the compression dictionary is shared between the compressing device or unit and the decompressing device or unit, generally because it is made of the last N bytes of uncompressed data decoded.
However, other situations may occur, such as for example transmitting such a compression dictionary to the decompressing device.
The DEFLATE algorithm is for example implemented in HTTP compression, for instance in the SPDY protocol or the SDCH approach.
The shared dictionary compression on HTTP (SDCH) method is a technique developed by Google™ and implemented within Google Chrome™ to improve web data compression. This technique uses a reference dictionary shared between the server and the web browser client.
In practice, the server generates a static reference dictionary that can be used efficiently for a set of digital resources. This compression dictionary is typically a text file that concatenates strings frequently occurring within the set of resources.
The server exchanges this reference dictionary with the client on the first use of this static dictionary.
The server and client exchange documents that are compressed using the VCDiff method based on the shared reference dictionary, as shown in FIG. 1.
A block of data to compress is compared with reference data in the shared reference dictionary. As many strings in the block of data to compress as possible are replaced by references to corresponding reference entries in the reference dictionary.
VCDiff organizes the stream as follows: strings that do not match any reference data in the reference dictionary are put first as ADD instructions. References to entries in the reference dictionary are then encoded as COPY instructions. Once the stream has been produced, it is compressed with generic lossless compression techniques such as DEFLATE.
Since it may be expensive to do exhaustive searching in large reference dictionaries, only long strings are actually searched for, typically using “fingertips” or “fingerprints” approaches.
For the purposes of illustration, reference is now made to the SPDY protocol, while the invention can apply to a wide variety of dictionary-based compression methods as suggested above.
In messages to be exchanged between communicating devices, there are often lists or groups of items of information that are compressed at one of the communicating devices and decompressed at the other communicating device. This is for example the case for HTTP where HTTP payload is compressed, as well as for SPDY protocol where HTTP headers are compressed.
HTTP is commonly used to request and send web pages, and is based on a client/server architecture, wherein the client sends requests, namely HTTP requests, to the server, and the server replies to the client's requests with responses, namely HTTP responses.
Requests and responses are messages that comprise various parts, among which are non-compressed HTTP headers and compressed HTTP payload. An HTTP header consists of a name along with a corresponding value.
In the first versions of HTTP, a TCP/IP connection was established for each HTTP request/response exchange.
SPDY has been developed to improve this situation by improving HTTP in several ways.
Firstly, it enables several HTTP requests and responses to be sent over a unique TCP/IP connection, thus defining a long-standing connection and a connection context made of the specificities and the history of the long-standing connection. In this way, all the components of a web page (HTML documents, images, JavaScript, etc.) may share the same TCP/IP connection, thus speeding up the web page loading.
Secondly, SPDY implements compression of the HTTP headers exchanged over the shared TCP/IP connection, using the DEFLATE algorithm. This binary compression reduces the network load.
As introduced above, the DEFLATE algorithm performs compression of a serialized binary representation of the HTTP headers, by searching for duplicate strings in the binary representation using a sliding window and replacing them with back references thereto. A serialized binary representation of the HTTP headers results from the serialization of the HTTP headers as a stream of bits (or bytes).
Thanks to the connection context, the DEFLATE algorithm can initiate the compression dictionary with the last 32 kilo-Bytes (kB) of message headers already processed when processing and compressing a new block of serialized binary HTTP headers in the same long-standing connection.
The compressing device (the server) and the decompressing device (the client) must keep synchronized, sharing the same buffer or compression dictionary containing the previously exchanged headers.
In this way, the algorithm reuses the knowledge of already exchanged headers to improve the headers' compression thanks to the high redundancy of headers between HTTP messages.
Final steps of the DEFLATE algorithm replace symbols of the back references with Huffman codes.
Compression gains obtained by SPDY are acceptable.
In the SPDY context, the same principle as applied to HTTP headers can be applied to web content exchanged as part of SPDY connections, i.e. on HTTP payload. In other words, each web digital resource, i.e. each web document, can be individually compressed using the DEFLATE algorithm.
Experiments were conducted by the inventors to measure the impact of the DEFLATE sliding window size on a set of web pages to exchange. This is illustrated through the plots of FIG. 2.
This Figure shows three plots of the size of compressed web pages for a set of 80 web pages. The three plots correspond to three sizes of the sliding window, respectively 8 kB=213 (plot w13), 16 kB=214 (plot w14) and 32 kB=215 (plot w15), where the plot w15 is the baseline with value 100 for comparison.
As obviously expected, the compression ratio decreases when passing from a 32 kB sliding window to a 16 kB sliding window, and then further decreases from a 16 kB sliding window to a 8 kB sliding window.
However, the loss in compression is not very high, less than 5% in most cases.