The Internet allows for vast amounts of information to be communicated over any number of interconnected networks, computers, and network devices. Typically, information or content is located at websites on one or more servers, and a user can retrieve the content using a web browser operating on a client station. For example, the user can enter a website address into the web browser or access a web link, which sends requests to the server to access and provide the content on the respective website. This type of communication is commonly referred to as “web browsing.”
Web browsing is enjoyed by millions of users on the Internet. However, accessing content on a network that is constrained by bandwidth and latency can make web browsing less enjoyable. Bandwidth affects the time for transmitting content over a network link. Latency affects the aggregate time for sending a request from a client station to a server and receiving a response from the server.
Many networks can suffer from bandwidth and latency problems that degrade the enjoyment of web browsing for users. Wireless wide area networks (WANs), such as GPRS or CDMA 1×RTT wireless networks, are just a few networks, along with traditional plain old telephone (POTS) dialup networks, that can exhibit similar bandwidth and latency problems. These networks may take 50 to 100 seconds to download content from a web page due to bandwidth and latency constraints, whereas a high-speed local area network (LAN) may be less prone to such constraints and can download the same content in 5 to 10 seconds. Waiting a long time to view content for a web page is annoying to users and inefficiently utilizes the network.
Utilizing a network efficiently is also a particular concern for network providers who must share limited resources among many users. For example, wireless WAN providers share very expensive and limited spectrum among all of its data and voice subscribers. Thus, efficient use of this spectrum frequencies is imperative. Furthermore, in a wireless WAN environment, data transmission is more susceptible to interference and noise in contrast to a wired environment. Interference and noise delays the data transmission process and, more importantly, causes variability and unpredictability in the delay. A web site that may download objects in 50 seconds the first time may download the same objects in 100 seconds the next time. Thus, in order to address these concerns, network providers must efficiently use existing network infrastructure to provide the most enjoyment to a user when downloading content.
Furthermore, the manner in which information is transferred on a network plays an important role in the network's efficiency. Referring to the World Wide Web (WWW), the Hypertext Transfer Protocol (HTTP) sets forth the rules for transferring content such as files or objects on the web. This protocol uses requests and responses for transferring content. For example, a user agent (e.g., a web browser or client) sends a request to the content server for a particular file or object of a web page, and the server of the web page looks up the object in a database and sends back the object as part of a response to the user agent. This process continues until every object in the web page has been downloaded to the user agent.
As web pages have become more complex, a common website may contain hundreds of objects on its web pages. Such objects may include text, graphics, images, sound, and etc. The web pages may also have objects located across multiple servers. That is, one server may provide dynamic content (e.g., content that remembers the last books ordered by a user) for a web page, whereas other servers may provide static but rotating content such as an advertisement, and still others provide the static content of the site. As such, before a user can view a web page, hundreds of objects may require downloading from multiple servers. Each server, however, may take a different amount of time to service a request for an object contributing further to latency. Thus, the latency for each server may vary with different levels of magnitude, e.g., one server may respond in milliseconds whereas another server may respond in seconds.
Latency constraints, however, should not be confused with bandwidth constraints. FIG. 1 illustrates the retrieval sequence for objects on a bandwidth constrained network using HTTP over TCP/IP. In this illustration, each request for an object requires a connection to be established between a client and a server with an exchange of “Sync” and “Ack” messages necessary for TCP/IP. Due to the relatively small latency of the network and the responsiveness of the server (primarily the small latency of the network), the Ack message is sent back to the client quickly. However, because the network is bandwidth constrained, a response back to the client takes a relatively long time. This is exacerbated if the object for the request is large in nature and must be broken into many packets as shown in FIG. 1. As a result, the overall download time for each request/response is dominated by the time it takes to download all the packets of the individual objects on a network link. Such download time can be calculated by adding the size of each of the individual objects and dividing the aggregate size by the link bandwidth.
FIG. 2 illustrates the retrieval sequence for objects on a latency constrained network using HTTP over TCP/IP. In this illustration, the network is not limited by bandwidth, but instead by the latency—or the time it takes to send a packet from the client to the server through the network. In particular, when a user agent requests small objects on a network affected by high latency, the overall download time is dominated by the time it takes a request to travel to the server, the responsiveness of the server to process the request, and the time it takes for a response to travel back to user agent. This download time can be calculated by adding the round trip time (RTT) for the request to travel to the server and the response to travel back to the client in addition to the response of the server and multiplying that by the number of objects on the web page.
FIG. 3 illustrates a graph that shows the instantaneous bandwidth versus the download time for an exemplary website (e.g., http://www.cnn.com). This illustration shows how inefficiently a network is utilized when affected by bandwidth constraints. In this example, the network has an ideal bandwidth maximum at 42.9 Kbs. However, only a small portion of the download time is actually spent at the ideal bandwidth. Thus, the network is latency constrained rather than bandwidth constrained.
These problems are well known in the networking community. To increase efficiency, the early web browsers, which implemented the inefficient HTTP 1.0 protocol, opened multiple TCP connections to web servers and simultaneously sent requests on each connection. Each connection then shared the available bandwidth which helped to increase overall bandwidth utilization. However, if the network was latency constrained, improved bandwidth utilization would not provide shorter download times.
Using the HTTP 1.0 protocol in this way has a number of disadvantages. One disadvantage is that it can adversely affect the capacity of servers. For example, if a server serves 100 simultaneous connections, and each user opens 10 connections, the server can only support 10 simultaneous users. However, if one connection is allocated per user, the server could support 100 simultaneous users. Thus, to ensure service to more users, many servers limit the number of connections per user.
Another disadvantage of the HTTP 1.0 protocol is that it can exacerbate the latency constraint effects. For instance, setting up and tearing down a connection requires several exchanges of messages, e.g., Syn, Syn+Ack, Ack, Fin, Ack, Fin, and Ack—which refer to data packet messages under TCP/IP. If a web browser opens 50 connections and the round trip time is 1 second for such messages, 100 seconds are spent for connection maintenance. For this reason, many web browsers limit the number of connections that can be established, e.g., some web browsers only allow 2 to 6 connections.
The HTTP 1.1 protocol addressed some disadvantages of the HTTP 1.0 protocol. For instance, the HTTP 1.1 protocol standardized the maximum number of connections a web browser could open to four. For most LAN environments with relatively low latency, a web browser having four open connections provides sufficient performance. The HTTP 1.1 protocol also standardized a technique referred to as “persistent connections,” which is an extension to the HTTP 1.0 protocol. A persistent connection allows multiple requests to be sent on the same connection. For example, a web browser can open a connection, make a request, receive the response, and then make another request on the same connection without tearing it down and forming a new connection.
Although HTTP 1.1 introduced concepts to alleviate the problems with connection maintenance, it did not address the adverse affect of HTTP 1.1 and 1.0 on the content server's capacity. Additionally, persistent connections do not improve download time performance if web page objects are spread across multiple servers or if the user browses from one page to the next. Either of these cases would require closing the old connection.
The HTTP 1.1 protocol did alleviate problems with persistent connections regarding dynamic content, which was not addressed in the persistent connection extension to HTTP 1.0. That is, the HTTP 1.0 protocol extension allowed for a “keep alive” feature for a persistent connection that required the content server to specify the length of a response in order for the client to distinguish one response from the next. However, this would not work if the web server was providing dynamic content and could not determine the size of the dynamic content ahead of time. Therefore, the server needed to avoid using persistent connections and closed the connections after downloading dynamic content responses. To address this problem, the HTTP 1.1 protocol allowed for “chunked” transfer encoding that allowed the content server to simply specify the size of the next chunk of data and use a special delimiter when the dynamic content transfer was completed. This allowed user agents to keep its persistent connections open for dynamic content.
With the advent of persistent connections, the use of an intermediary or proxy server located between client stations and content servers became popular in many networks. Typically, a proxy server was used in an enterprise environment for security reasons, but it could also be used to improve network performance. For example, web browsers operating on client stations could open a number of persistent connections to the proxy server. The proxy server could then open new persistent connections to the content servers. In this manner, web browsers reused their persistent connections to the proxy server for downloading the objects of a web page, even if the objects resided on different content servers. Furthermore, the proxy server reused its persistent connections to the content servers for multiple web browsers. For popular web pages, the proxy server could maintain persistent connections without tearing them down.
Thus, the proxy server improved performance for a latency constrained network by allowing a web browser to open persistent connections with the proxy server only once. This reduced the exchange of messages when downloading objects of a web page. Using a proxy server with existing HTTP protocols, however, suffers from the request-response nature of such prior protocols. For instance, even though the proxy server could maintain persistent connections, the proxy server could only have one outstanding request on each persistent connection. As a result, before another request could be issued, its response had to be received first, which is illustrated in FIG. 4. Consequently, the HTTP 1.1 protocol introduced “pipelining” to alleviate this problem. Pipelining allows a user agent to send multiple requests on a given connection without the requirement of receiving responses for previous requests. This reduced the effects of latency on the download time of web pages with multiple objects, and thus reduced download times on latency constrained networks, which is illustrated in FIG. 5.
The above prior techniques of using persistent connections, chunked encoding, proxy servers, and pipelining can improve performance, however, a number of disadvantages are apparent for these prior techniques. For example, the prior techniques do not account for the varying delays across different content servers or web servers. In addition, a prior proxy server receiving responses from content servers must deliver all responses to web browsers in the same order that the requests were received by the proxy server. Consequently, if a content server that receives a first request from a web browser is slow, a proxy server must hold up all other responses designated for the web browser until the slow content server responds. The slow server may not even respond. In this case, the proxy server must close the connection with the web browser and disregard any previously received responses from other content servers.
Another disadvantage of the prior techniques is that the prior techniques cannot efficiently handle responses with large objects that may monopolize a pipeline on a connection. For instance, if a web browser requests many objects on one pipelined connection, and the first request actually corresponds to a very large object, all the smaller objects will be blocked at the proxy until the large object completes. If the web browser had known of this ahead of time, it would have requested the large object on another connection outside of the pipeline so that the smaller objects could proceed in parallel on another connection outside of the pipeline so that the smaller objects could proceed in parallel.
Thus, there is a need to overcome the above limitations of the prior techniques and provide a more efficient manner of handling requests and responses on a network.