The present invention relates generally to providing load balancing across a collection (or cluster) of servers such as proxy servers and Web servers in the Internet environment. A more particular aspect of the present invention relates to a method of updating routing information using meta data piggybacked with the response to client requests. Yet another aspect is related to a load balancing method which also optimizes caching efficiency.
While dictionary meanings are also implied by certain terms used here, the following glossary of some terms may be useful.
Internet
The network of networks and gateways that use the TCP/IP suite of protocols.
Client
A client is a computer which issues commands to the server which performs the task associated with the command.
Server
Any computer that performs a task at the command of another computer is a server. A Web server typically supports one or more clients.
World Wide Web (WWW or Web)
The Internet""s application that lets people seeking information on the Internet switch from server to server and database to database by clicking on highlighted words or phrases of interest (hyperlinks). An Internet WWW server supports clients and provides information. The Web can be considered as the Internet with all of the resources addressed as URLs and which uses HTML to display the information corresponding to URLs and provide a point-and-click interface to other URLs.
Universal Resource Locator (URL)
A way to uniquely identify or address information on the Internet. Can be considered to be a Web document version of an e-mail address or a fully-qualified network file name. They can be accessed with a Hyperlink. An example of a URL is xe2x80x9chttp://www.philipyu.com:80/table.htmlxe2x80x9d. Here, the URL has four components. Starting from the left, the first specifies the protocol to use, separated from the rest of the locator by a xe2x80x9c:xe2x80x9d. Next is the hostname or IP address of the target host; this is delimited by the xe2x80x9c//xe2x80x9d on the left and on the right by a xe2x80x9c/xe2x80x9d or optionally a xe2x80x9c:xe2x80x9d. The port number is optional, and is delimited on the left from the hostname by a xe2x80x9c:xe2x80x9d and on the right by a xe2x80x9c/xe2x80x9d. The fourth component is the actual file name or program name. In this example, the xe2x80x9c.htmlxe2x80x9d extension means that this is an HTML file.
HyperText Markup Language (HTML)
HTML is a language which can be used, among other things, by Web servers to create and connect documents that are viewed by Web clients. HTML uses Hypertext documents.
Hypertext Transfer Protocol (HTTP)
HTTP is an example of a stateless protocol, which means that every request from a client to a server is treated independently. The server has no record of previous connections. At the beginning of a URL, xe2x80x9chttp:xe2x80x9d indicates the file should be retrieved using http.
Internet Browser or Web Browser
A graphical interface tool that runs Internet protocols such as http, and display results on the user""s screen. The browser can act as an Internet tour guide, complete with pictorial desktops, directories and search tools used when a user xe2x80x9csurfsxe2x80x9d the Internet. In this application the Web browser is a client service which communicates with the World Wide Web.
Client Cache
Client caches are typically used as primary caches for objects accessed by the client. In a WWW environment, client caches are typically implemented by web browsers and may cache objects accessed during a current invocation, i.e., a nonpersistent cache, or may cache objects across invocations.
Caching Proxies
Specialized servers in a network which act as agents on the behalf of the client to locate a cached copy of an object. Caching proxies typically serve as secondary or higher level caches, because they are invoked as a result of cache-misses from client caches.
HTTP Daemon (HTTPD)
A server having Hypertext Markup Language and Common Gateway Interface capability. The HTTPD is typically supported by an access agent which provides the hardware connections to machines on the intranet and access to the Internet, such as TCP/IP couplings.
The traffic on the World Wide Web is increasing exponentially. Proxy servers, especially at a gateway to a large organization or region, can comprise a collection of computing nodes. Similarly, at popular (hot) Web sites, a collection (or cluster) of computing nodes is used to support the access demand.
To achieve good performance in a server cluster, the load should be balanced among the collection of nodes. This should be tempered by the need to optimize the cache hit ratio in a given server in the cluster by localizing identical object requests.
Previous work on load balancing in a multi-processor or multiple node environment, such as the IBM S/390 Sysplex, primarily focused on scheduling algorithms which select one of multiple generic resources for each incoming task or user session. The scheduler controls the scheduling of every incoming task or session and there is no caching of the resource selection.
One known method for balancing the load among geographically distributed replicated sites is known as the Round-Robin Domain Name Server (RR-DNS) approach. In the paper by Katz., E., Butler, M., and McGrath, R., entitled xe2x80x9cA Scaleable HTTP Server: The NCSA Prototypexe2x80x9d, Computer Networks and ISDN Systems, Vol. 27, 1994, pp. 68-74, the RR-DNS method is used to balance the node across a set of web server nodes. Here, the set of distributed sites is represented by one URL (e.g., www.hotsite.com); a cluster sub-domain for this distributed site is defined with its sub-domain name server. The sub-domain name server maps the name resolution requests to different IP addresses (in the distributed cluster) in a round-robin fashion. Thus, subsets of the clients will be assigned to each of the replicated sites. In order to reduce network traffic, a mapping request is not issued for each service request. Instead, the result of the mapping request is saved for a xe2x80x9ctime-to-livexe2x80x9d (TTL) interval. Subsequent requests issued during the TTL interval retain the previous mapping and hence will be routed to the same server node.
A problem with the RR-DNS method is that a load imbalance among the distributed sites may result (see e.g., Dias, D. M., Kish, W., Mukherjee, R., and Tewari, R., in xe2x80x9cA Scaleable and Highly Available Web Serverxe2x80x9d, Proc. 41st IEEE Computer Society Intl. Conf. (COMPCON) 1996, Technologies for the Information Superhighway, pp. 85-92, February 1996). The load imbalance can be caused by caching of the association between a name and IP address at various gateways, fire-walls, and domain name-servers in the network. Thus, for the TTL period all new client requests routed through these gateways, fire-walls, and domain name-servers will be assigned to the single site stored in the cache. Those skilled in the art will realize that a simple reduction in the TTL value will not solve the problem. In fact, low TTL values are frequently not accepted by many name servers. More importantly, a simple reduction of TTL value may not reduce a load skew caused by unevenly distributed client request rates.
One method of load balancing within a local cluster of nodes is to use a so-called TCP router as described in: xe2x80x9cA Virtual Multi-Processor Implemented by an Encapsulated Cluster of Loosely Coupled Computers,xe2x80x9d by Attanasio, Clement R. and Smith, Stephen E., IBM Research Report RC 18442, 1992; and U.S. Pat. No. 5,371,852, entitled xe2x80x9cMethod and Apparatus for Making a Cluster of Computers Appear as a Single Host,xe2x80x9d issued Dec. 6, 1994 which is hereby incorporated by reference in its entirety. Here, only the address of the TCP router is given out to clients; the TCP router distributes incoming requests among the nodes in the cluster, either in a round-robin manner, or based on the load on the nodes. It should be noted that this TCP router method is limited to a local cluster of nodes.
More recently, in the paper by Colajanni, M., Yu, P., and Dias, D., entitled xe2x80x9cScheduling Algorithms for Distributed Web Servers,xe2x80x9d IBM Research Report, RC 20680, January 1997, which is hereby incorporated by reference in its entirety, a multi-tier round robin method is described which divides the gateways into multiple tiers based on their request rates. Requests from each tier are scheduled separately using a round robin algorithm. This method can also handle a homogeneous distributed server architecture.
In all of the above approaches, the goal is to balance the load among a collection of servers. The dynamic routing decision does not take into account the identity of the object being requested. In other words, multiple requests for the same object may be routed to different servers to balance the load. This will result in a poor cache hit ratio which is especially severe for proxy servers since the potential number of distinct Web pages referenced can be very large. Although in a Web server cluster, a static partition can be made to the Web pages wherein each partition is assigned a different (virtual) host name or IP, a static partitioning approach lacks the flexibility to cope with dynamic load changes and moreover, is not scaleable.
Thus, there is a need for an improved load balancing method and apparatus in a server cluster which not only balances the load across the cluster but also optimizes the cache hit ratio in a given server in the cluster by localizing identical object requests. The present invention addresses such a need.
There is also a need for an improved routing method which assigns each server to handle a subset of the object space dynamically according to workload conditions and routes object requests to the server assigned to the subspace associated with the object. The present invention also addresses such a need.
In accordance with the aforementioned needs, the present invention is directed to an improved method and apparatus for dynamic routing object requests among collection of servers that takes into account either: the caching efficiency of the servers and load balance; or just the load balance.
The present invention also has features which can dynamically update server routing information by xe2x80x9cpiggybackingxe2x80x9d meta information with the response to the routing requests. The present invention has other features which can improve the cache hit ratio at a server by mapping a server based on the identifier (e.g., URL) of the object requested and dynamically updating this mapping if workload conditions change. In an Internet environment, the collection of servers can include, but is not limited to, a proxy server cluster or a Web server cluster.
A method having features of the present invention for dynamically routing object requests among a collection of server nodes, includes the steps of: piggybacking meta information with a requested object; and dynamically updating routing information for a server assignment according to the meta information.
A method having features of the present invention for dynamic routing object requests among a collection of server nodes while optimizing cache hits, further includes the steps of: mapping an object identifier to a class; and assigning a server based on the class and a class-to-server assignment table.
The present invention has still other features which can inform the requester node in an xe2x80x9con-demandxe2x80x9d basis of a dynamic change in a class-to-server assignment. The class-to-server assignment can change dynamically as the workload varies. To avoid costly broadcasting of the changes to all potential requesters, or forcing requesters to first obtain a mapping each time an request is sent, the server can advantageously continue to serve an object request even if it is not the one assigned to process that class. However, the server can indicate in a header of the returned object (or response), the information on the new class-to-server assignment.
Furthermore, the present invention""s features for piggybacking meta information with requested objects can also be applied to a conventional DNS routing in the Internet to improve load balancing in a server cluster. This should be distinguished from the concept of using an object""s URL (or object class) to make a server assignment (to improve the cache hits). DNS routing has a valid interval (TTL) for address mapping. The present invention has features which allow server assignments to be generated at an interval smaller than the TTL and thus better reflect true load conditions. Changes in server assignment can be piggybacked with the returned object, avoiding added traffic, so that future requests can be sent to the new server.
The present invention has still other features which can dynamically and incrementally change the class-to-server assignment based on the workload demand to balance the load.
According to yet other features of the present invention, in an Internet environment, the PICS protocol may be used to communicate various types of information. PICS can be used by the server to piggyback the meta information on a new class-to-server mapping when a request is directed to a server based on an obsolete class-to-server mapping entry. PICS can also be used by the requester to query the coordinator for the current class-to-server mapping.
Those skilled in the art will appreciate that the present invention can be applied to general distributed environments as well as the World Wide Web.