The present invention relates to a method and apparatus for compressing Internet protocol messages, such as Hypertext Transfer Protocol (HTTP) messages.
HTTP is a communication protocol used to make Hypertext Markup Language (HTML) and other applications available to users on the Web. An HTML file is stored in a directory that is accessible to a server. Such a server is typically a Web server which conforms to a Web browser-supported protocol such as Hypertext Transfer Protocol (HTTP).
Alternatively, HTML content may be stored at the headend of a subscriber communication network, such as a cable/satellite television network. There is an increasing trend toward providing HTML content to subscribers via such networks due to the network""s high speed data rates, the potential commercial benefits for tying in the HTML content with traditional television programming services, the expected convergence of telephone, television and computer networks, and the expected rise of in-home computer networks. The HTML content may be selected and provided directly by the headend, or the headend may merely act as a conduit in a high speed link between the subscriber and remote Web servers.
Servers that conform to other protocols, such as the File Transfer Protocol (FTP) or Gopher may also be accessed by an HTTP browser by using a proxy server. A proxy server is a type of gateway that allows a browser using HTTP to communicate with a server that does not understand HTTP, but which uses, e.g., FTP, Gopher or other protocols. The proxy server accepts HTTP requests from the browser and translates them into a format that is suitable for the origin server, such as an FTP request. Similarly, the proxy server translates FTP replies from the server into HTTP replies so that the browser can understand them.
Generally, the FTP file itself is not translated. FTP is a high level protocol for transferring files (as is HTTP). The translation occurs at the protocol level. For example, a client browser may send the HTTP request xe2x80x9cGET ftp://www.myserver.com/somefile.txt HTTP/1.1xe2x80x9d. This would be translated at a proxy into an FTP xe2x80x9cGETxe2x80x9d request to be forwarded to the FTP origin server. The FTP response from the origin server back to the proxy (which has the requested file attached) is then translated at the proxy into an HTTP response that includes the attached file (e.g., as an object). The file being transferred is not translated or modified. However, in some cases, the browser may indicate that it can decode certain encoding or compression formats. Thus, the proxy may translate (encode or compress) the attached file before the file is transmitted to the client.
The proxy server can be a program running on the same machine as the browser, or a free-standing machine somewhere in a network that serves many browsers.
For example, the headend of a subscriber communication network may provide a proxy server function.
HTTP defines a set of rules that servers and browsers follow when communicating with each other. Typically, the process begins when a user clicks on an icon in an HTML page that is the anchor of a hyperlink, or the user types in a Uniform Resource Locator (URL). The URL contains a host name that is typically resolved into an IP address via a domain name system (DNS) lookup. A connection is then made to the host server using the IP address (and possibly a port number) returned by the DNS lookup. Next, the browser sends a request to retrieve an object from the server, or to post data to an object on the server. The server sends a response to the browser including a status code and the response data. The connection between the browser and server is then closed.
Generally, HTTP is implemented in a client program and a server program, which execute on different end systems and communicate with each other by exchanging HTTP messages. HTTP defines the structure of these messages and how the client and server exchange the messages.
However, due to the increasing popularity and expansion of the Internet, the amount of Internet traffic, including HTTP request and response messages, has also increased. Accordingly, the amount of processing power required by a user""s terminal and browser, or other client or server, may not be sufficient to keep up with the flow of data. This can result in undesirable delays in obtaining requested data, such as HTML data, which is rendered on a user""s screen, or other problems.
Moreover, an increasing amount of bandwidth for transmitting the HTTP messages is consumed, thereby reducing the available bandwidth for other uses, or taxing the capacity of the channel.
The HTTP messages data may be transmitted via a Public Switched Telephone Network (PSTN), via a cable or satellite television network, via a local wireless network, or via a combination of the above, for example.
In particular, HTTP messages typically include strings of ASCII characters. However, with eight bits (one byte) of data required for each character (including a letter, number, punctuation symbol, blank space, and carriage return), the amount of data in an HTTP message can be significant.
Accordingly, it would be desirable to provide a system for compressing HTTP or similar messages.
The system should reduce the amount of bandwidth required to communicate HTTP data to a browser, server or other processor.
The system should be suitable for use with existing networks over which Web data (e.g., HTML) is communicated.
The system should allow a browser that is implemented in a terminal (e.g., set-top box/decoder), in a subscriber television network, to directly process the compressed HTTP data without decompressing it.
The system should reduce the required processing power of a browser in a user terminal in a subscriber television network.
The system should provide a consistent and deterministic processing time for all compressed HTTP elements within a given message.
The system should be usable on a client/browser side or server side of a network.
The system should be usable on a proxy server that interfaces between a client/browser and a server, or other proxy servers.
The system should be compatible with networks that communicate Web data using a digital video communication protocol, such as MPEG-2.
The system should be compatible with networks that communicate Web data using the Transmission Control Protocol/Internet Protocol (TCP/IP).
The system should provide compression for current versions of HTTP, as well as derivations thereof and other analogous protocols, such as Gopher, FTP or Telnet.
The system should be compatible with other bit level compression techniques.
The present invention provides a system having the above and other advantages.
The present invention relates to a method and apparatus for compressing Internet, or any digital protocol messages, such as HTTP messages.
Codewords are provided for HTTP data elements (e.g., character strings) to reduce the amount of data, such as in an HTTP request or response message. The codewords may have reserved bits to distinguish specific data elements or to provide other information about the message to aid in processing. The technique is compatible with other compression techniques to provide even greater compression.
The invention provides a significant reduction in the amount of data that must be communicated, e.g., during a Web browsing session at a subscriber terminal. Additionally, the invention allows the use of a network processor or browser, e.g., in a subscriber terminal, to process the compressed HTTP codewords directly without decompressing them. This can provide significant savings in processing time and complexity.
Additionally, each codeword can have the same length and therefore generally takes the same amount of time to process, so the processing time becomes more deterministic. Alternatively, variable length codewords can be provided, such as with an entropy coding scheme.
A particular encoding method for processing an Internet protocol message, such as an HTTP message, includes the step of providing a plurality of codewords for coding a corresponding plurality of recognizable data elements of the protocol. Each of the recognizable data elements comprises a string having a plurality of characters (e.g., letters, numbers, and/or other symbols). The digital protocol message is parsed to locate data elements thereof corresponding to the recognizable data elements. A corresponding one of the codewords is output for each of the recognizable data elements located in the digital protocol message to provide the message in a compressed format.
A corresponding decoding method includes the step of providing a plurality of data elements of the protocol for decoding a corresponding plurality of the codewords. The compressed message is parsed to locate the codewords thereof. Next, the respective data elements are provided for each corresponding codeword located to provide the digital protocol message in an uncompressed format. The uncompressed data can then be processed by a conventional HTTP message handler.
Optionally, the compressed HTTP messages can be decoded directly, without decompression. In particular, such a method includes the step of parsing the compressed message to locate the codewords thereof. Next, the located codewords are provided to a compressed protocol data message handler for processing thereat in accordance with the protocol without recovering the corresponding data elements. Such a message handler can be designed using known hardware and/or software techniques to directly recognize and process the compressed HTTP data (e.g., codewords).
In addition, an optimal solution provides the capability to cache (e.g., temporarily store) the compressed data in a proxy server for content that is accessed frequently by subscriber terminals.
Corresponding apparatuses are also disclosed.