The HyperText Transport Protocol (“HTTP”) is the data communication protocol used to retrieve Web pages and other content from servers on the World Wide Web (“the Web”). Given the vast number of individuals who access the Web on a daily basis, HTTP is the most widely used data communication protocol today.
FIG. 1 illustrates an exemplary HTTP-based client-server interaction. The client 110 uses a Web browser 120 such as Microsoft's Internet Explorer or Firefox (an open source browser available from www.mozilla.org) to communicate with remote Web servers 130. In response to a the selection of a hyperlink from within a previously-downloaded Web page or the manual entry of a new uniform resource locator (“URL”) identifying a Web page, the browser 120 transmits an HTTP request 101 over the network 140 to the Web server 130. Assuming that the user has not previously downloaded the requested Web page to the client 110, the Web server 130 transmits an HTTP response 102 to the client containing the new Web page.
As illustrated in FIG. 2, the HTTP response 102 typically includes a header portion 201 and a body portion 202. The body portion 202 is used to store the underlying content of the Web page (e.g., text, graphics, animation, etc). By contrast, the header portion 201 contains metadata including, for example, a content length field indicating how many bytes the HTTP body takes up; a last modified date/time field indicating that last time the Web page was modified (in GMT format); and an “ETag” field indicating the current value of the “entity tag” for the requested Web page. The entity tag is a unique ID analogous to a checksum which identifies a particular version of a particular resource (such as a Web page) on the Web.
If the user has previously downloaded a copy of the Web page, then the browser 120 transmits the header data 201 for the Web page with the HTTP request 101. The Web server 130 then compares the header data 201 against the header data for the current version of the Web page. For example, the Web server may compare the value of the ETag field in the header data sent from the client with the value of the ETag field for the current version of the Web page. If the current version is the same as the version transmitted by the client 110, then the Web server should (if it is configured properly) transmit an “HTTP 304” result code in the HTTP response 102 embedded within the response header. For an HTTP 304 result code, The HTTP response 102 does not include the body of the message. Upon receipt of the response, the browser identifies the HTTP 304 result code and displays the current version of the Web page cached within the browser (i.e., the most recent version). Network bandwidth is thereby conserved.
One problem which exists with the foregoing configuration is that many current Web servers do not generate HTTP 304 result codes when they should and, instead, re-transmit the entire Web page to the client 110 even though the client 110 already has a local copy of the Web page. Given that a typical header may be in the range of, e.g., 250 bytes whereas a typical Web page body may be in the range of, e.g., 50 Kbytes, a significant amount of bandwidth is unnecessarily wasted. While this may not be a significant problem for relatively high powered clients coupled to broadband Internet connections (e.g., DSL or corporate T1-based local area networks), it can be a problem for users with relatively low bandwidth connections such as dial-up users and for users of wireless data processing/telephony devices.
Accordingly, what is needed is a more efficient mechanism for downloading Web pages and other types of content on a data network.