This invention relates generally to client-server computer networks. More particularly, this invention relates to a general technique for improving the performance of a proxy server array during persistent network access.
The recent publicity and emphasis on the information superhighway has increased the awareness and acceptance of the Internet as a mass communications medium. Until recently, xe2x80x9ccruising or surfingxe2x80x9d the Internet was a disorienting, even a frustrating experience, something like trying to navigate without maps. The World Wide Web has made it easier to access the array of resources available on the Internet. Resources, such as web servers, ftp servers, and telnet servers, provide the user with the ability to easily find the data content or information he wants simply and easily.
The volume of World Wide Web traffic on the Internet is staggering but a significant fraction of this traffic is redundant. That is, a large number of users request the same data from the same resource, at around the same time. As a result, a significant percentage of a corporation""s network infrastructure is carrying and servicing the repeated requests for same data content, day after day.
To manage this growing demand for access to the Internet and to reduce network communications costs, several system-level applications have been developed to extend caching to the client/server network environment. Two current approaches are proxy server arrays and network caches.
Proxy server arrays are network server-based applications that are placed between a client application, such as a web browser, and a resource, such as a Web server. Initially proxy servers were designed to deal with problems caused by firewall issues in corporate web access. Eventually, proxy servers were also recognized as being an ideal environment to cache web data and to improve system performance, as well as to reduce the load on the network and on the servers.
In most World Wide Web based client/server applications, the proxy server receives a request to access a specific resource from a client system. The proxy server examines the request to determine if it can service the request itself. If the particular web page is stored in its cache, the proxy server will retrieve the web page and forward it to client that made the request. If not, the proxy server sends a request to the desired resource site specified by the Uniform Resource Locator (URL). The URL acts as the address of the resource and as such is unique throughout the Internet. The proxy server retrieves the web page from the resource specified by the URL address and transfers the web page to the client. In addition, the proxy server stores the web page in its cache for future use by other client systems.
It is becoming common for networks (such as intranets or the internet) to include a plurality of proxy servers accessible by one or more clients. Requests from clients for various pages stored on servers in the network are routed through the plurality of proxy servers, which cache pages whenever possible.
Certain systems allow the client to specify to which proxy server it will send its request. Thus, a certain client can send requests to more than one proxy server. While caching performed by these proxy servers improves the overall performance of a network, having more than one proxy server store the same web pages in its cache is inefficient.
The World Wide Web operates using the http protocol. An initial version of http (http 1.0) required a separate TCP connection for each transfer of information between a client and a proxy or a server. Subsequently, newer versions of the http protocol have reduced the need to establish a separate connection for each request. For example, version 1.1 of the http protocol includes xe2x80x9cpersistent http,xe2x80x9d where multiple http transfers can use the same connection, i.e., the same proxy server without having to establish a new connection each time. Persistent http connections are described in, for example, Request For Comments (RFC) 2068, xe2x80x9cHypertext Transfer Protocolxe2x80x94HTTP/1.1,xe2x80x9d January 1997, available from the Internet Engineering Taskforce (IETF).
A problem arises, however, if a client can send its requests to more than one proxy server, since clients generally do not know whether a proxy server has a persistent connection to the web server or not. Furthermore, if one or more clients can access one or more proxy servers, the clients do not know which of the proxy servers have already received requests for a particular service from another client (and may therefore have a persistent connection open).
In some conventional systems, the Internet Cache Protocol (ICP) is used to determine and select the most applicable location from which to retrieve a requested web page. In ICP, one proxy server establishes a xe2x80x9cworkingxe2x80x9d relationship with other proxy servers. Proxy servers designated as parents are on one level while child proxy servers are on lower level(s). The terms xe2x80x9cneighborxe2x80x9d and xe2x80x9cpeerxe2x80x9d refer to either a parent or a sibling which are a single xe2x80x9ccache referencexe2x80x9d away.
In general, in ICP the flow of a client request is up through the hierarchy of proxy servers. If a proxy server does not have a client""s requested web page, it requests that a special proxy server, called an arbitrator, query the other proxy servers to see if they have the desired web page. If any of these proxy servers has the requested web page in its cache, then the inquiring proxy server enters a demand for the web page. The cached web page is either forwarded directly to the client or to the original proxy server for transfer to the client. If none of the proxy servers have the web page in their cache, the proxy server must forward the request either to a parent or back to the origin proxy server for service. Thus, if a successful request or xe2x80x9chitxe2x80x9d occurs, it may fetch the web page from a peer proxy server or the requested wed page is received from a parent but if the request is unsuccessful or xe2x80x9cmissed,xe2x80x9d it must be passed to a parent server for service. The role of a parent is complete the transaction and service the request. If necessary, a parent proxy server will open a resource directly to service a client""s request.
There are several problems that arise with the ICP approach. For example, the arbitrating proxy server may be overrun with requests or the network path between proxy servers may be congested. In addition, the additional hierarchy introduces extra delays for the clients requesting uncached data.
Other conventional client/server systems, such as the CARP (Cache Array Routing Protocol), available from Microsoft, Inc. of Redmond, Wash. access a variety of proxy servers to retrieve pages stored on a single server. CARP, for example, uses a deterministic hashing function in the client to allocate page accessing and caching among a variety of proxy servers. By accessing a variety of proxy servers for various pages stored on a server, the CARP system aims to achieve load balancing between the proxy servers. Unfortunately, such a system has the disadvantage that it requires a hashing function that distributes the page accesses equally among the proxy servers. If the hashing function is poorly chosen or if the URL names lead to unbalanced distribution of the URLs to various proxies, it will negatively affect the load balancing between proxy servers. Thus, CARP""s use of a deterministic hashing function to distribute requests to proxies does not always achieve a good distribution of requests among proxies. Lack of good distribution leads to inefficient usage of proxies. More importantly, with CARP, two URLs for the same Web server are likely to be sent to two different proxy servers, thereby undoing the benefits of persistent connections between a particular proxy and a server.
A described embodiment of the present invention provides a method and apparatus that ensures that requests for pages for a particular domain name are routed to the same proxy server by all of a plurality of clients. If, for example, a proxy server has a persistent connection to a server for a domain, all incoming requests for that domain will be sent to that proxy server and will, thus, will be able to take advantage of the persistent connection between the proxy server and the server hosting the Web site. The clients use a proxy table stored in the client to determine to which of a plurality of proxy servers they should send requests for particular pages.
Each client contains a proxy table that is periodically updated by one or more of the proxy servers. In the described embodiment, each proxy table in a client has an associated xe2x80x9ctime to live.xe2x80x9d When a proxy table""s time to live has expired, the client obtains a new proxy table. In one embodiment, a proxy table in a client contains an entry corresponding to each proxy server. Each proxy table entry identifies the name of the corresponding proxy server, the network address of the proxy server (for example, the IP address), a port number of the proxy server, and a URL from which to get a new copy of the proxy table itself.
In a described embodiment, when a client needs to access a resource through a proxy server, the client truncates the address (e.g., the URL) of the resource. Thus, for example, all addresses in a particular domain name are truncated to the same value. The truncated address is then used to hash into the proxy table in the client and to identify a proxy server. The client sends its request to the identified proxy server. Thus, all requests for a particular domain hash to the same proxy table entry and, hence, to the same proxy server. If the proxy server has opened a persistent connection to the server for the requested domain, the proxy server will be able to take advantage of the persistent connection.
In accordance with the purpose of the invention, as embodied and broadly described herein, the invention relates to a method of accessing information in a client/server network, comprising the steps, performed by a client in the client/server network, of: receiving a proxy table having at least one entry for each of a plurality of proxy servers in the client/server network, the entry containing information about how to connect to a respective one of the proxy servers; receiving an address of a page to access, the page being stored on a server in the client/server network; truncating the address to remove from the address a portion of the address attributable to the page, yielding a truncated address identifying the domain of the page; hashing the truncated address to yield an index value of the proxy table; and accessing a proxy server that is pointed to by the index value in the proxy table to retrieve the page.
Advantages of the invention will be set forth, in part, in the description that follows and, in part, will be understood by those skilled in the art from the description herein. The advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims and equivalents.