Data to be encoded often include or are associated with lists or groups of items of information that are also to be compressed at an encoder and decompressed at a decoder. This is for example the case for HTTP (standing for Hypertext Transfer Protocol) where HTTP headers are added to requested data to provide additional information, for example in relation with the connection or protocol used or in relation with the requested data that are sent.
Another example is document or media metadata represented as JSON (standing for JavaScript Object Notation) key-value sets or XML data (standing for Extensible Markup Language).
HTTP is commonly used to request and send data such as web resources, web pages being particular web resources. HTTP is based on a client/server architecture, wherein a client entity initiates a connection with a server device, then sends requests for data, namely HTTP requests, to the server device. Thereafter, the server device replies to the client entity's requests with responses, namely HTTP responses that include the requested data and the so-called HTTP headers.
While in the initial deployment of HTTP, a TCP/IP bidirectional connection was established for each HTTP request/response exchange, SPDY protocol enables several HTTP requests and responses to be sent over a unique TCP/IP bidirectional connection between the client entity and the server device. Based on such persistent bidirectional connection, all the components of a web page (HTML documents, images, JavaScript, etc.) may share the same TCP/IP connection, thus speeding up the web page loading.
The server device usually handles requests for data from a plurality of client entities and provides responses thereto. Some client entities may be embodied within the same device, for example as various client applications requesting the server device. Some other client entities may correspond to the same number of separate client devices.
The explanation and description below concentrate on HTTP headers in HTTP messages, whereas the invention may apply to any list or group of items of information that is encoded or compressed using indexing mechanism.
The HTTP requests and HTTP responses are messages that comprise various parts of data, including header items and payload data. The HTTP headers of an HTTP message, and more generally the headers of a message, form a set of headers, i.e. items of information of a particular type. They are added into the beginning of the response message, before the payload data, to provide additional information useful for various purposes. For example, they may give protocol-based information vital to convey efficiently the response message over the connection implementing the protocol. They may also provide information about the payload data themselves, for example regarding the nature of the date, an image size, etc., this information being necessary for an addressee client entity or application to correctly handle the data.
An HTTP header generally consists of a name along with a corresponding value.
For instance, in the header “Host: en.wikipedia.org”, Host is the header name, and its value is “en.wikipedia.org”. This header is used to indicate the host of the requested web resource (for instance, the Wikipedia page describing HTTP, available at http://en.wikipedia.org/wiki/HTTP).
Conventional HTTP provides compression of the HTTP payload data before the HTTP message is transmitted, while the set of HTTP headers is not compressed but literally represented, i.e. it is encoded as text data. Literal representation consists in encoding a header by encoding its name and its value as strings.
However, the headers tend to be redundant in successive messages. This is the case for HTTP.
Textual encoding is not efficient in this situation, resulting in some HTTP improvements to have emerged with a view of defining more compact encodings.
HTTP/2.0 standard has been developed in this context and proposes a mechanism for encoding HTTP headers using item indexing based on an indexing table (or compression dictionary). HTTP/2.0 uses a dynamic compression scheme to optimize the size of the representation of HTTP message headers.
A header indexing table (or header table) is defined that comprises a list of entries with which respective coding indexes are associated, each entry being a (header name, header value) pair.
The header indexing table is filled with some headers, selected by the encoder, that are encoded using literal representation. It is said that the literally encoded header is indexed in the table, i.e. is added to the header indexing table.
Two different kinds of indexing are available: incremental indexing where the literally encoded header is appended to the header table, thus having the next available index, and substitution indexing where the literally encoded header replaces a header previously present at a given index in the header indexing table. Note that in the case of substitution indexing, the substituted index is encoded to fully define a substitution indexing and make it possible for a decoder to perform the same substitution in a local corresponding header indexing table. Similarly, the literally encoded headers that are to index in the header table are flagged when transmitted to the decoder for the latter to be able to build the same header table as the encoder.
The above-defined literal representation of headers may optionally include the encoding of the header name by using the index associated with an entry already present in the header indexing table and having the same name.
Compression efficiency is obtained by the indexed representation of headers that occur for the second or more time. The indexed representation consists in encoding a header by encoding the index associated with the same header in the header indexing table.
Additional mechanisms further improve compression.
For example, according to HTTP/2.0 standard, a set of headers is encoded by taking the previous set of headers as a reference: only the differences with the previous set of headers are encoded. This is to take advantage of high redundancy between consecutive sets of HTTP headers (i.e. between the headers of consecutive HTTP messages).
To illustrate this mechanism, it is assumed that there are few differences between a set N of headers already encoded and a next set N+1 of headers to be encoded. Instead of encoding set N+1 by encoding all its headers, set N+1 is encoded by encoding the sole differences with set N.
To be noted that not all the headers of set N are used as a reference list for encoding set N+1: only the indexed headers of set N are present in the reference list for set N+1. This is because a header of set N that is not indexed is presumably unlikely to occur again in the next set N+1 of headers (otherwise, it would have been indexed).
The mechanism of header encoding with reference is further illustrated with reference to FIG. 1 that shows a first set 0 of headers (reference 100) which is encoded prior to a second header set 1 (reference 110). In this simple and illustrative example, the two header sets are made of three headers, a “url” header, a “method” header and a “cookie” header.
It is considered that the three headers of header set 0 have been literally encoded and then indexed in the header indexing table 120. Hence, after encoding of set 0, the header indexing table 120 comprises those three headers, with indexes ranging from 0 to 2.
Based on the above mechanism, the encoding of header set 1 takes header set 0 as a reference list of headers. The differences between header set 0 and set 1 are determined: the sole difference in the example is the value of the “url” header from set 0. Therefore, two information items have to be encoded, as shown in table 130:                the “url” header from set 0, which is present in the reference list for set 1, has to be removed. According to HTTP/2.0 standard, this may be done by encoding its index “IH(0)” (second line of table 130). There is no need to indicate whether this index corresponds to an addition or deletion of an entry in the header indexing table. This is because, since the header associated to this index is already present in the reference list (previously encoded header set), its presence in the list of differences only indicates a deletion according to the above-mentioned standard (a pair (name, value) cannot be present more than once in a set of headers); and        the “url” header from set 1, which is not present in the reference list, has to be added. This is done by encoding the “url” header of set 1 (third line of table 130), for instance using a literal representation. The presence of this header necessarily corresponds to an addition, since said header is not present in the reference list.        
Another mechanism is delta encoding which provides encoding of a header value using reference to a previously indexed header value (although the values are different) and encoding the difference between the two values. For instance a common prefix between the header value and the previously indexed header value is determined, and then a length of the common prefix is encoded followed by the characters that differ between the two values).
Taking the example of URLs, a header (“url”, “http://example.com/456”) could thus be encoded as a reference to a previous header (“url”, “http://example.com/123”) already present in the header indexing table. In this case, the following information is encoded:                the index of the previous header in the header indexing table;        a length of “19”, corresponding to the common prefix between the values “http://example.com/456” and “http://example.com/123”; and        the suffix to be added to said common prefix (“456”).        
This is to take advantage of high redundancy between consecutive sets of HTTP headers (i.e. between the headers of consecutive HTTP messages) to reduce the size of resulting encoded header data.
The header indexing mechanism explained above may also involve other representations (for instance, some values may be encoded as typed values, e.g. using binary encoding for integers and dates) and/or Huffman encoding or Deflate to improve the encoding of headers.
The header indexing scheme described above takes place during the HTTP request/response exchange between a server device and a client entity.
Due to the different nature of the requests and responses, the two directions of communication in the bidirectional connection do not usually convey headers having the same names. To take advantage of this difference with a view of optimizing compression, a HTTP node, either the server device or the client entity, manages two header indexing tables: the first one, referred to as a decoding header indexing table, for decoding headers from incoming messages, and the second one, referred to as an encoding header indexing table, for encoding headers of outgoing messages.
The header indexing tables are generated at their respective encoding sides. That means that there are a server-initiated indexing table and a client-initiated indexing table for each connection.
It is known from publication GB 2,496,385 a client entity and a server device that initialize respective initial compression dictionaries with information shared between them, for example headers they often use. This makes it possible to achieve good compression right from the first headers to be encoded and transmitted.
Regardless publication GB 2,496,385, each header indexing table is filled progressively with headers that are indexed, when encoding the headers at the encoding side and correspondingly decoding the headers at the decoding side. It results that the decoding indexing table at the client entity is at last similar to the server-initiated indexing table. Reversely, the decoding indexing table at the server device is at last similar to the client-initiated indexing table.
Since the server device responds to client entities' requests for data, the order of the responses is generally dependent on the order of the requests. This in turns has an effect on what is indexed in the server-initiated header indexing table and how it is indexed.
First, if comparing two server-initiated header indexing tables for two bidirectional connections established between the server device and two respective client entities, not all the headers from one server-initiated indexing table will be indexed in the other server-initiated indexing table.
For instance, in a case where a client entity only requests valid web resources, the server-initiated header indexing table will not contain any ‘status: 404’ header. However, if the other client entity requests a web resource that does not exist on the server device, the server-initiated header indexing table will probably contain a ‘status: 404’ header.
In addition, the order of the responses has also an impact on the indexes assigned to the entries in the server-initiated header indexing table, thus resulting in different server-initiated indexing tables for different connections. This is illustrated with reference to FIG. 2, wherein FIG. 2a represents headers in a HTTP request/response exchange A for the retrieval of a web page named “index.html”, and FIG. 2b represents headers in a HTTP request/response exchange B for the retrieval of a CSS file named “style.css”.
The server-initiated header indexing table 240 (shown in FIG. 2c) is generated by the server device on processing (encoding) response A 210 prior to response B 230. The server-initiated header indexing table 250 (shown in FIG. 2d) is generated by the server device on processing (encoding) response B prior to response A. In both cases, the ‘content-type’ and ‘status’ headers are indexed.
As can be seen from FIG. 2c and FIG. 2d, the server-initiated header indexing tables 240, 250 are different: the header entries are the same but the indexes associated with the two ‘content-type’ headers are not the same.
The above shows that HTTP/2.0 standard provides a dynamic header compression that is adapted to each client/server connection and adapts itself based on the HTTP headers that are encoded. One component contributing to this dynamic behavior is the server-initiated header indexing table specific to each client/server connection.
The server-initiated header indexing tables for the bidirectional connections with client entities and the processing based on these tables mandate processing and memory costs for client entities and server devices compliant with HTTP/2.0. In particular, these costs increase linearly with the number of client entities for the server device.
This may cause issues for server or proxy devices that have a large number of concurrent connections with client entities, in particular for small embedded devices such as CoAP (standing for Constrained Application Protocol) targeted devices.