1. Field of the Invention
The present invention relates to a communication system, and is more particularly related to retrieving web content using proxy servers.
2. Discussion of the Background
As businesses and society, in general, become increasingly reliant on communication networks to conduct a variety of activities, ranging from business transactions to personal entertainment, these communication networks continue to experience greater and greater delay, stemming in part from traffic congestion and network latency. For example, the maturity of electronic commerce and acceptance of the Internet, in particular the World Wide Web (xe2x80x9cWebxe2x80x9d), as a daily tool pose an enormous challenge to communication engineers to develop techniques to reduce network latency and user response times. With the advances in processing power of desktop computers, the average user has grown accustomed to sophisticated applications (e.g., streaming video, radio broadcasts, video games, etc.), which place tremendous strain on network resources. The Web as well as other Internet services rely on protocols and networking architectures that offer great flexibility and robustness; however, such infrastructure may be inefficient in transporting Web traffic, which can result in large user response time, particularly if the traffic has to traverse an intermediary network with a relatively large latency (e.g., a satellite network).
FIG. 6 is a diagram of a conventional communication system for providing retrieval of web content by a personal computer (PC). PC 601 is loaded with a web browser 603 to access the web pages that are resident on web server 605; collectively the web pages and web server 605 denote a xe2x80x9cweb site.xe2x80x9d PC 603 connects to a wide area network (WAN) 607, which is linked to the Internet 609. The above arrangement is typical of a business environment, whereby the PC 601 is networked to the Internet 609. A residential user, in contrast, normally has a dial-up connection (not shown) to the Internet 609 for access to the Web. The phenomenal growth of the Web is attributable to the ease and standardized manner of xe2x80x9ccreatingxe2x80x9d a web page, which can possess textual, audio, and video content.
Web pages are formatted according to the Hypertext Markup Language (HTML) standard which provides for the display of high-quality text (including control over the location, size, color and font for the text), the display of graphics within the page and the xe2x80x9clinkingxe2x80x9d from one page to another, possibly stored on a different web server. Each HTML document, graphic image, video clip or other individual piece of content is identified, that is, addressed, by an Internet address, referred to as a Uniform Resource Locator (URL). As used herein, a xe2x80x9cURLxe2x80x9d may refer to an address of an individual piece of web content (HTML document, image, sound-clip, video-clip, etc.) or the individual piece of content addressed by the URL. When a distinction is required, the term xe2x80x9cURL addressxe2x80x9d refers to the URL itself while the terms xe2x80x9cweb contentxe2x80x9d, xe2x80x9cURL contentxe2x80x9d or xe2x80x9cURL objectxe2x80x9d refers to the content addressed by the URL.
In a typical transaction, the user enters or specifies a URL to the web browser 603, which in turn requests a URL from the web server 605. The web server 605 returns an HTML page, which contains numerous embedded objects (i.e., web content), to the web browser 603. Upon receiving the HTML page, the web browser 603 parses the page to retrieve each embedded object. The retrieval process often requires the establishment of separate communication sessions (e.g., TCP (Transmission Control Protocol) sessions) to the web server 605. That is, after an embedded object is received, the TCP session is torn down and another TCP session is established for the next object. Given the richness of the content of web pages, it is not uncommon for a web page to possess over 30 embedded objects. This arrangement disadvantageously consumes network resources, but more significantly, introduces delay to the user.
Delay is further increased if the WAN 607 is a satellite network, as the network latency of the satellite network is conventionally a longer latency than terrestrial networks. In addition, because HTTP utilizes a separate TCP connection for each transaction, the large number of transactions amplifies the network latency. Further, the manner in which frames are created and images are embedded in HTML requires a separate HTTP transaction for every frame and URL compounds the delay.
Based on the foregoing, there is a clear need for improved approaches for retrieval of web content within a communication system.
There is a need to utilize standard protocols to avoid development costs and provide rapid industry acceptance.
There is also a need for a web content retrieval mechanism that makes the networks with relatively large latency viable and/or competitive for Internet access.
Therefore, an approach for retrieving web content that reduces user response times is highly desirable.
According to one aspect of the invention, a communication system for retrieving web content comprises a downstream proxy server that is configured to receive a URL request message from a web browser. The URL request message specifies a URL content that has an embedded object. An upstream proxy server is configured to communicate with the downstream proxy server and to receive the URL request message from the downstream proxy server. The upstream proxy server selectively forwards the URL request message to a web server and receives the URL content from the web server, wherein the upstream proxy server forwards the URL content to the downstream proxy server and parses the URL content to obtain the embedded object prior to the web browser having to issue an embedded object request message. The above arrangement advantageously reduces user response time associated with web browsing.