1.1 Field of the Invention
The present invention relates generally to the distribution of World Wide Web content over a geosynchronous satellite communications network, and in particular, to satellite communications networks having an outbound high-speed, continuous channel carrying packetized data and either a satellite inbound channel or a terrestrial inbound channel, such as a dialup connection to the Internet.
1.2 Description of related Art
1.2.1 Caching HTTP Proxy Servers
The most popular method for distributing multimedia information is the Internet""s World Wide Web. The World Wide Web can be considered to be a set of network accessible information resources. In the World Wide Web, many Web Servers and Web Browsers are connected to the Internet via the TCP/IP protocols and the Web Browsers request web pages and graphics and other multimedia content via the Hypertext Transfer Protocol (HTTP).
The World Wide Web is founded on three basic ideas:
1. A global naming scheme for resourcesxe2x80x94that is, Uniform Resource Locators (URLs).
2. Protocols for accessing named resourcesxe2x80x94the most common of which is the Hypertext Transfer Protocol (HTTP).
3. Hypertextxe2x80x94the ability to embed links to other resources which is typically done according to the Hypertext Markup Language (HTML).
Web pages are formatted according to the Hypertext Markup Language (HTML) standard which provides for the display of high-quality text (including control over the location, size, color and font for the text), the display of graphics within the page and the xe2x80x9clinkingxe2x80x9d from one page to another, possibly stored on a different web server. Each HTML document, graphic image, video clip or other individual piece of content is identified, that is, addressed, by an Internet address, referred to as a Uniform Resource Locator (URL). In the context of this invention, a xe2x80x9cURLxe2x80x9d may refer to an address of an individual piece of web content (HTML document, image, sound-clip, video-clip, etc.) or the individual piece of content addressed by the URL. When a distinction is required, the term xe2x80x9cURL addressxe2x80x9d refers to the URL itself while the terms xe2x80x9cURL contentxe2x80x9d or xe2x80x9cURL objectxe2x80x9d refers to the content addressed by the URL.
A web browser may be configured to either access URLs directly from a web server or from an HTTP proxy server. An HTTP proxy server acts as an intermediary between one or more browsers and many web servers. A web browser requests a URL from the proxy server which in turn xe2x80x9cgetsxe2x80x9d the URL from the addressed web server. An HTTP proxy itself may be configured to either access URLs directly from a web server or from another HTTP proxy server. When a proxy server sends a request to another proxy server the proxy server processing the request is referred to as being upstream (that is, closer to the web server). When a proxy server receives a request from another proxy server, the requesting proxy server is referred to as being downstream, that is, farther from the Web Server.
FIG. 1 illustrates a system in which one of a plurality of browsers accesses a web server via upstream and downstream proxy servers with an HTTP GET command. In particular, a plurality of PCs 12, each including a browser 14, output a GET command to web server 16, in order to access the URL xe2x80x9cAxe2x80x9d. Assuming PC 12 and browser 14 make the first request, the GET command is passed to downstream proxy server 18. Since this is the first request for URL xe2x80x9cAxe2x80x9d, the downstream proxy server 18 does not have URL xe2x80x9cAxe2x80x9d in its cache 20. As a result, the downstream proxy server 18 also issues a GET URL xe2x80x9cAxe2x80x9d command to upstream proxy server 22. Since this is also the first request to upstream proxy server 22 for the URL xe2x80x9cAxe2x80x9d, the upstream proxy server 22 also does not have URL xe2x80x9cAxe2x80x9d in its cache 24. Therefore, the upstream proxy server 22 issues a GET URL xe2x80x9cAxe2x80x9d command directly the web server 16. The web server 16 services this request and provides the upstream proxy server 22 with the desired information, which is then stored in the cache 24. The upstream proxy server 22 passes the desired information to the downstream proxy server 18, which also stores the desired information in its cache 20. Finally, the downstream proxy server 18 passes the desired information to the originating requestor""s browser 14 at PC 12, which also stores the desired information in its cache 21.
Subsequently, PC 12xe2x80x2, via its browser 14xe2x80x2, also desires the information at URL xe2x80x9cAxe2x80x9d. PC 12xe2x80x2 issues a GET URL xe2x80x9cAxe2x80x9d command to downstream proxy server 18. At this time, downstream proxy server 18 has the desired information in its cache 20 and provides the information directly to PC 12xe2x80x2 without requesting additional information from either the upstream proxy server 22 or the web server 16. Similarly, if PC 12xe2x80x3, via its browser 14xe2x80x3, also desires the information at URL xe2x80x9cAxe2x80x9d, PC 12xe2x80x3 issues a GET URL xe2x80x9cAxe2x80x9d command to downstream proxy server 18xe2x80x2. However, since downstream proxy server 18xe2x80x2 does not have the information for URL xe2x80x9cAxe2x80x9d stored in its cache 20xe2x80x2, the downstream proxy server 18xe2x80x2 must access the upstream proxy server 22 and its cache 24, in order to supply the desired information to PC 12xe2x80x3. However, the upstream proxy server 22 does not have to access the web server 16, because the desired information is stored in its cache 24.
As described above, a caching HTTP proxy server, such as downstream proxy servers 18, 18xe2x80x2 and upstream proxy server 22 store (cache) some URLs. Normally, a caching proxy server stores the most frequently accessed URLs. When a web server delivers a URL, it may deliver along with the URL an indication of whether the URL should not be cached and an indication of when the URL was last modified. As described in conjunction with FIG. 1, the URLs stored by a caching proxy server are typically URLs obtained on behalf of a browser or downstream proxy server. A caching HTTP proxy server satisfies a request for a URL, when possible, by returning a stored URL. The HTTP protocol also supports a GET IF MODIFIED SINCE request wherein a web server (or a proxy server) either responds with a status code indicating that the URL has not changed or with the URL content if the URL has changed since the requested date and time.
FIG. 2 illustrates a browser executing a GET IF MODIFIED SINCE command from web server 16. As illustrated in FIG. 2, the PC 12, including browser 14, has already requested URL xe2x80x9cAxe2x80x9d once and has URL xe2x80x9cAxe2x80x9d stored in its cache 21. PC 12 now wants to know if the information stored at URL xe2x80x9cAxe2x80x9d has been updated since the time it was last requested. As a result, the browser 14 issues a GET A IF MODIFIED SINCE the last time xe2x80x9cAxe2x80x9d was obtained. Assuming that URL xe2x80x9cAxe2x80x9d was obtained at 11:30 a.m. on Jul. 13, 1999, browser 14 issues a GET A IF MODIFIED SINCE Jul. 15, 1999 at 11:30 a.m. request. This request goes to downstream proxy server 18. If downstream proxy server 18 has received an updated version of URL xe2x80x9cAxe2x80x9d since Jul. 15, 1999 at 11:30 a.m., downstream proxy server 18 will supply the new URL xe2x80x9cAxe2x80x9d information to the browser 14. If not, the downstream proxy server 18 will issues a GET IF MODIFIED SINCE command to upstream proxy server 22. If upstream proxy server 22 has received an updated URL xe2x80x9cAxe2x80x9d since Jul. 15, 1999 at 11:30. a.m., upstream proxy server 22 will pass the new URL xe2x80x9cAxe2x80x9d to the downstream proxy server 18. If not, the upstream proxy server 22 will issue a GET A IF MODIFIED SINCE command to the web server a. If URL xe2x80x9cAxe2x80x9d has not changed since Jul. 15, 1999 at 11:30 a.m., web server 16 will issue a NO CHANGE response to the upstream proxy server 22. In this way, bandwidth and processing time are saved, since if the URL xe2x80x9cAxe2x80x9d has not been modified since the last request, the entire contents of URL xe2x80x9cAxe2x80x9d need not be transferred between web browser 14, downstream proxy server 18, upstream proxy server 22, and the web server 16, only an indication that there has been no change need be exchanged.
Caching proxy servers offer both reduced network utilization and reduced response time when they are able to satisfy requests with cached URLs. Much research has been done attempting to arrive at a near-optimal caching policy, that is, determining when a caching proxy server should store URLs, delete URLs and satisfy requests from the cache both with and without doing a GET IF MODIFIED SINCE request against the web server. Caching proxy servers are available commercially from several companies including Microsoft, Netscape, Network Appliance and Cache Flow.
1.2.2 Satellite Multicast Networks
Typical geosynchronous satellites relay a signal from a single uplink earth station to any number of receivers under the xe2x80x9cfoot printxe2x80x9d of the satellite. FIG. 3 illustrates a typical satellite system 40. The satellite system 40 includes an uplink earth station 50, a satellite 52, and receiving terminals 54, 54xe2x80x2, 54xe2x80x3, 54xe2x80x2xe2x80x3. The satellite system 40 covers a footprint 56, which in the example in FIG. 3, is the continental United States. The footprint 56 typically covers an entire country or continent. Multicast data is data which is addressed to multiple receiving terminals 54. When the signal is carrying digital, packetized data, a geosynchronous satellite 52 is an excellent mechanism for carrying multicast data as a multicast packet need only be transmitted once to be received by any number of terminals 54. Such a signal, by carrying both unicast and multicast packets can support both normal point-to-point and multicast applications. Satellite multicast data systems are typically engineered with Forward Error Correcting (FEC) coding in such a way that the system is quasi-error free, that is, under normal weather conditions packets are hardly ever dropped.
The Internet Protocol (IP) is the most commonly used mechanism for carrying multicast data. Satellite networks capable of carrying IP Multicast data include Hughes Network System""s Personal Earth Station VSAT system, Hughes Network System""s DirecPC(trademark) system as well as other systems by companies such as Gilat, Loral Cyberstar and Media4.
VSAT systems, such as the Personal Earth Station by Hughes Network Systems, use a satellite return channel to support two-way communication, when needed. For World Wide Web access, a terminal using a VSAT system sends HTTP requests to the Internet by means of the VSAT""s inbound channel and receive HTTP responses via the outbound satellite channel. Other systems, such as DirecPC""s(trademark) Turbo Internet, use dialup modem. (as well as other non-satellite media) to send HTTP requests into the Internet and receive responses either via the outbound satellite channel or via the dialup modem connection. Satellite networks often have a longer latency than many terrestrial networks. For example, the round trip delay on a VSAT is typically 1.5 seconds while the round trip delay of dialup Internet access is typically only 0.4 seconds. This difference in latency is multiplied in the case of typical web browsing in that multiple round trips are required for each web page. This places web browsing via satellite at a distinct disadvantage relative to many terrestrial networks. The present invention provides a major reduction in this disadvantage and as such greatly increases the value of web browsing via satellite.
The present invention is directed to a communication network having an outbound high-speed channel carrying packetized data and either a satellite inbound channel or a terrestrial inbound channel, such as a dial-up connection to the internet. The communication network includes at least one upstream proxy server and at least two reporting downstream proxy servers, where the at least one upstream proxy server is capable of multicasting URLs to the at least two reporting downstream proxy servers. The at least two reporting downstream proxy servers interact with the at least one upstream proxy server to resolve cache misses and the at least one upstream proxy server returns at least one resolution to the cache misses via multicast. The proxy servers included in the communication system may include reporting proxy servers, non-reporting proxy servers, and best effort proxy servers. A reporting downstream proxy server interacts with an upstream proxy server to satisfy a cache miss. A non-reporting downstream proxy server interacts with a web server to satisfy a cache miss. A best effort downstream proxy server requests a cache-miss URL from both the upstream proxy server and the web server.
In one embodiment, the downstream proxy server filters multicast transmissions of URLs and stores the subset of the URLs for subsequent transmission where relative popularity is used to determine whether to store a multicast URL. In one embodiment, the upstream proxy server is capable of multicasting URLs to at least two reporting downstream proxy servers, the upstream proxy server interacts with the two reporting downstream proxy servers to resolve cache misses and the upstream proxy server returns at least one resolution to the cache misses via multicast.
In another embodiment, the downstream reporting proxy server includes a data base and a processor for receiving entries sent by an upstream proxy server, for filtering unpopular entries, keeping popular entries in the database, deleting previously stored entries from the data base, expiring previously stored entries from the data base, or reporting new entries to the upstream proxy server.
As described above, the communication system lowers user response time, lowers network utilization, and reduces the resources required by an HTTP proxy server.
In other embodiments, the present invention is directed to a proxy protocol which performs transaction multiplexing which prevents a single stalled request from stalling other requests, performs homogenized content compression which intelligently compresses HTTP request and response headers and performs request batching so that nearly simultaneously received requests are sent in a single TCP segment, in order to reduce the number of required inbound packets.
FIG. 1 illustrates a conventional system, including browsers, web servers, upstream and downstream proxy servers, and the execution of a GET COMMAND.
FIG. 2 illustrates a conventional system, including browsers, web servers, upstream and downstream proxy servers, and the execution of a GET IF MODIFIED SINCE COMMAND.
FIG. 3 illustrates a conventional satellite system.
FIG. 4 illustrates a communication system in one embodiment of the preferred invention.
FIG. 5 illustrates a communication system in another embodiment of the present invention.
FIG. 5a illustrates an upstream proxy server in one embodiment of the present invention.
FIG. 5b illustrates a downstream proxy server in one embodiment of the present invention.
FIG. 5c illustrates the cache lookup processing performed by a reporting downstream proxy server in one embodiment of the present invention.
FIG. 5d illustrates the cache lookup processing performed by a non-reporting downstream proxy server in one embodiment of the present invention.
FIG. 5e illustrates the cache lookup processing performed by a best-effort downstream proxy server in one embodiment of the present invention.
FIG. 6 illustrates the TCP/IP packets which traverse the communication link for a single HTTP transaction without the benefit of the present invention.
FIG. 7 illustrates the TCP/IP packets which traverse the network medium for a single HTTP transaction with the benefit of one embodiment of the present invention.
FIG. 8 illustrates an HTTP request in one embodiment of the present invention.
FIG. 9 illustrates an HTTP response in one embodiment of the present invention.