This invention relates generally to providing load balancing across distributed computing systems. More particularly it relates to a routing method for use in distributed systems including a set of server computing nodes, all or a subset of which can handle a client request, but where there is a preferred node or a set of nodes that are best suited to handle a particular client request.
While dictionary definitions apply to the terms herein, the following definitions of some terms are also provided to assist the reader:
An Encapsulated Cluster (EC) is characterized by a Connection-Router (CR) node and multiple server hosts providing a set of services (e.g. Web service, NFS, etc.). An example of a system which provides encapsulated clustering is described in U.S. Pat. No. 5,371,852, entitled xe2x80x9cMETHOD AND APPARATUS FOR MAKING A CLUSTER OF COMPUTERS APPEAR AS A SINGLE HOST ON A COMPUTER NETWORKxe2x80x9d.
A virtual encapsulated cluster system describes an improvement to the aforementioned U.S. Pat. No. 5,371,852. Like the system of U.S. Pat. No. 5,371,852, a Virtual Encapsulated Cluster routes TCP information that crosses the boundary of a computer cluster. The information is in the form of port type messages. Incoming messages are routed and the servers respond so that each cluster appears as a single computer image to the external host. In a virtual encapsulated cluster a cluster of servers with a single TCP-router node is divided into a number of virtual clusters. Each virtual encapsulated cluster appears as a single host to hosts on the network which are outside the cluster. The messages are routed to members of each virtual encapsulated cluster in a way that keeps the load balanced among the set of cluster nodes.
A recoverable virtual encapsulated cluster is a virtual encapsulated cluster which has two TCP-router nodes, a primary and a backup. The cluster is augmented with a recovery manager which causes the backup TCP-router to become active if the primary fails. In addition methods are added so that the connection state at the time of failure can be reconstructed by (or alternatively known at) the backup router so that zero or the minimum number of client connections will be lost due to failure of the TCP-router node. Methods are also added so that the configuration/management information of the virtual encapsulated cluster are replicated (or constructed) at the backup. Finally the start up protocol of the TCP-router node is changed so that recovery of the primary router will not cause a failure in a backup which has taken over for it. This is described in the aforementioned co-pending patent application entitled xe2x80x9cWeighted TCP Routing to Service Nodes in a Virtual Encapsulated Cluster,xe2x80x9d by Attanasio et al.
The traffic on the World Wide Web is increasing exponentially, especially at popular (hot) sites. In order to increase the processing capacity at such hot sites, a cluster of computing nodes, which we will refer to as a multi-node cluster, can be provided to handle the load. The multi-node cluster is (encapsulated) made to appear as one entity to clients, so that the added capacity provided by the multi-node cluster is transparent to clients. Client requests need to be distributed among nodes in the multi-node cluster.
One known method in the art that attempts to balance the load among nodes in a multi-node cluster is known as the Round-Robin Domain Name Server (RR-DNS) approach. The basic domain name server method is described in the paper by Mockapetris, P., entitled xe2x80x9cDomain Namesxe2x80x94Implementation and Specificationxe2x80x9d, RFC 1035, USC Information Sciences Institute, November 1987. In the paper by Katz., E., Butler, M., and McGrath, R., entitled xe2x80x9cA Scalable HTTP Server: The NCSA Prototypexe2x80x9d, Computer Networks and ISDN Systems, Vol. 27, 1994, pp. 155-164, round-robin DNS (RR-DNS) is used to balance the load across a set of web server nodes. In this approach, the set of nodes in the multi node server is represented by one URL (e.g. www.hotsite.com); a cluster subdomain for this distributed site is defined with its subdomain name server. This subdomain name server maps client name resolution requests to different IP addresses in the distributed cluster. In this way, subsets of the clients will be pointed to each of the geographically distributed sites. Load balancing support using DNS is also described in the paper by Brisco, T., xe2x80x9cDNS Support for Load Balancingxe2x80x9d, RFC 1794, Rutgers University, April 1995.
A key problem with this approach is that the RR-DNS leads to poor load balance among the distributed sites, as described in the paper, Dias, D. M., Kish, W., Mukherjee, R., and Tewari, R., xe2x80x9cA Scalable and Highly Available Web Serverxe2x80x9d, Proc. 41st IEEE Computer Society Intl. Conf. (COMPCON) 1996, Technologies for the Information Superhighway, pp. 85-92, Febuary 1996. The problem is due to caching of the association between names and IP addresses at various name servers in the network. Thus, for example, for a period of time (time-to-live) all new clients behind an intermediate name server in the network will be pointed to just one of the sites. This leads to hot spots on nodes of the server cluster that move to different cluster nodes as the time-to-live periods expire.
One known method to solve this problem within a cluster of nodes at a single site is to provide a encapsulated cluster using a so-called TCP router as described in: Attanasio, Clement R. and Smith, Stephen E., xe2x80x9cA Virtual Multi-Processor Implemented by an Encapsulated Cluster of Loosely Coupled Computersxe2x80x9d, IBM Research Report RC 18442, 1992, and, U.S. Pat. No. 5,371,852, Dec. 6, 1994, by Attanasio et al., entitled xe2x80x9cMethod and Apparatus for Making a Cluster of Computers Appear as a Single Hostxe2x80x9d (Attanasio). Here, only the address of the TCP router is given out to clients; the TCP router distributes incoming requests among the nodes in the cluster, either in a round-robin manner, or based on the load on the nodes. In Attanasio, the TCP router can act as a proxy, where the requests are sent to a selected node, and the responses go back to the TCP router and then to the client. This proxy mode of operation can lead to the router becoming a bottleneck, and for this reason is not considered further herein. In another mode of operation, which we will refer to as the forwarding mode, client requests are sent to a selected node, and the responses are sent back to the client directly from the selected node, bypassing the router. In many environments, such as the World Wide Web (WWW) the response packets are typically much larger than the incoming packets from the client; bypassing the router on this response path is thus critical.
The work described in the previous paragraph was expanded upon and improved in the co-pending patent application Ser. No. 08/701,939 xe2x80x9cWeighted TCP Routing to Service Nodes in a Virtual Encapsulated Clusterxe2x80x9d by C. Attanasio, G. Hunt, G. Goldszmidt, and S. Smith. This patent application describes how the same facility can be made recoverable. The TCP router is enhanced to handle virtual clusters, and multiple target addresses within a router, and the manager component is described which collects information and dynamically controls the weighted routing.
As described above, the TCP router would typically send different client TCP connection requests to different nodes within a cluster. There are several applications where specific multi-node servers would be preferred for certain client requests, based on either the static or dynamic state of system. Thus a key problem with the TCP router approach is providing support for client requests with affinity requirements.
An important example of this is the support of the Secure Sockets Layer (SSL) protocol, which is a very popular protocol used for the exchange of secure information between clients and servers on the WWW, and for other environments. In SSL, a session key is generated by the client, and passed to the server after encrypting it using the server""s public key. Session keys have a lifetime (e.g. 100 seconds). Subsequent SSL requests from the same client within the lifetime of the session key will reuse the key. With the base TCP router method, subsequent requests from the same client could be routed to another node, but would require re-negotiating a session key, which is an expensive operation. Often, a single web page may contain embedded images, which are typically requested from the server simultaneously, after the base HTML page is received by the web browser. If each embedded image is to be retrieved using SSL, and if the requests were routed to any node by the (base) TCP router, a new session key would again have to be re-negotiated for each embedded element of the page, which can be prohibitively expensive in terms of the resource usage and latency.
More generally, applications may have affinity to nodes based on the state at the server. The state at the server could be dependent on previous routing decisions, as in the case of SSL, or it could be due to information or computation at the server. For example, a cluster of servers could also have a partitioned database, and a client may have affinity with a node of the cluster, based on the database partition located at that node.
Thus there is a need to provide a method for affinity-based routing in an encapsulated cluster or virtual encapsulated cluster, wherein a TCP router sends client requests to nodes in the cluster, and wherein the responses go back directly to the client from the node selected by the TCP router to handle the client request, the alternative where the response request goes through the router.
Accordingly, it is an object of this invention to provide a method for providing an encapsulated cluster with affinity-based routing of client requests to nodes in the cluster.
It is yet another object to keep the method for affinity routing simple but effective, so that the overhead for affinity routing and load balancing is small compared to that for serving the client requests.
Another aspect of this invention provides a method for affinity-based routing in an encapsulated cluster wherein specific clients may have affinity with specific nodes in the cluster that may be based on the static state or dynamic state at the cluster node independent of where previous requests from this client were routed.
In a computer network including an encapsulated cluster of nodes, an affinity-based method for routing client requests to one of a plurality of server nodes in the cluster having features of the present invention includes the steps of: communicating from the client to a router node, a plurality of packets associated with a connection; and routing the packets to a preferred server having affinity with the client according to state information maintained at the router.
Another aspect of this invention provides an affinity-based routing in the encapsulated cluster that may depend on a dynamic state of a cluster node to which previous client requests were routed. In accordance with this aspect of the present invention, wherein the state information includes information on a previous connection to one of the server nodes, the routing step includes the further steps of: determining if one of the packets is associated with the previous connection; routing the request to the server node associated with the previous connection; and if the state information is not found, creating and storing at the router, state information associated with the connection.
According to yet another aspect of the present invention, these and further objectives and advantages are achieved by designating a node at each of the multi-node clusters as a TCP router, wherein clients are assigned to one of the multi-node clusters by giving them the address of the corresponding TCP router, and wherein the TCP router selects a node in the cluster to process the client request based on state maintained in the TCP router. The state in the TCP router may be set by the router (e.g. based on previous routing decisions) or may be set by one or more servers (e.g. based on the state of the servers).
A preferred embodiment of the present is described in the context of supporting SSL. Those skilled in the art will readily appreciate that it can be used for providing affinity routing in a more general context. Those skilled in the art also recognize that this method can be easily extended to recoverable virtual encapsulated clusters.
A method in accordance with the preferred embodiment of the present invention extends the TCP router to maintain an affinity table of recent client TCP connections after the TCP connections have been closed (by a FIN command). This affinity table contains information of the client (or proxy) IP address, (an indication of the service that was requested) the server node that it was previously routed to, and the time at which the initial connection was made (or the time at which the previous connection was closed). If another SSL connection request arrives at the TCP router from the same client (or proxy) IP address, within a pre-specified (or configured) affinity period for the corresponding entry in the affinity table, then the TCP router allocates that TCP session request to the same node as specified in the corresponding affinity table entry. (Note that an SSL connection request can be distinguished because it uses a pre-assigned and different port number.) In this manner, a client that makes an SSL request is routed with affinity to a particular node for a configurable affinity time period (also known as the affinity period). For SSL, the configurable affinity time period can be set to be the lifetime of the SSL session key.
Entries in the affinity table become stale after the affinity period from the initial connection (or from the last connection close) has expired. These stale entries can be deleted either when encountered during a search of the table, or by a background garbage collector. For a bounded affinity table size, if the size of the table reaches the bound, entries can be eliminated based on stale connections first, time since last access, or other cast-out criteria.
It is possible that the node involved in the affinity routing may become overloaded, and it may then be desirable to allow routing to another node in the cluster. Based on the load on the preferred node due to affinity routing, the router may choose to route a request to another node in the cluster; for the SSL case, this would require renegotiating a new session key. Thus the routing decision could be based on both the load on the affinity-based node and on the overhead involved in negotiating the new session key. According to yet another aspect of the present invention, for environments wherein a parallel database is used at the cluster nodes, specific clients may have affinity with specific nodes in the cluster. For example, in the TCPB benchmark, clients associated with a bank branch have affinity with the node that has the branch partition; in the TPCC benchmark, clients associated with a warehouse have affinity with the node that has the partition for the corresponding warehouse. Such cases of affinity of clients to nodes may occur for other environments as well. Here, a method according to the present invention includes the steps of: the router initially routing a client request for which the router does not have any cached information to any node in the server, or based on server load; the server node (e.g., in a CGI script) could then determine the best cluster node to process this client request based on the database partitioning, or some other criteria; and the server node then resets the corresponding entry in the router affinity table to the correct node, so that subsequent requests from this client would be routed to the node to which the client had affinity.
In other environments, there is affinity between different ports, such that if a specific port from a particular client was previously routed to specific server node, then another request from the same client on a different but associated port needs to be routed to the same server node. For example, with the FTP protocol, there is such an affinity between ports 20 and 21 (the control and data ports); if a specific client with a request to port 20 was previously routed to a server node A, then an associated request from the same client to port 21, while the TCP connection to port 21 is still active, needs to also be routed to server node A. This is accomplished by noting that the two ports have associated affinity. The TCP router keeps connection records for active connections associated with the primary port. When a new connection arrives for the secondary port, in this case port 20, the TCP router checks the connection records for the primary port, if it finds one for the same client it routes the new request to the indicated server. For still other applications, for example DB2, the need for affinity is not dependant on a port or pre-specified time out. A sequence of requests from a particular client needs to be routed to the same server because of state at the server as previously discussed. According to yet another aspect of the present invention, the server may specify the start and end of the affinity requirement. Specifically, interfaces can be added to the router which allow a server in the cluster to connect to the router and specify the start and end of affinity for any one of it""s clients. When affinity is turned on, all requests for a single client will be routed to the indicated server until affinity is turned off.