1. Field of the Invention
This invention relates to the field of distributed computing systems and, more particularly, to load balancing and fail-over in clustered distributed computing systems.
2. Description of the Related Art
As workloads on modern computer systems become larger and more varied, more and more computational and data resources may be needed. For example, a request from a client to web site may involve a load balancer, a web server, a database, and an application server. Alternatively, some large-scale scientific computations may require multiple computational nodes operating in synchronization as a kind of parallel computer.
Any such collection of computational resources and/or data resources tied together by a data network may be referred to as a distributed system. Some distributed systems may be sets of identical nodes each at a single location connected together by a local area network. Alternatively, the nodes may be geographically scattered and connected by the Internet, or a heterogeneous mix of computers, each acting as a different resource. Each node may have a distinct operating system and be running a different set of applications.
Nodes in a distributed system may also be arranged as clusters of nodes, with each cluster working as a single system to handle requests. Alternatively, clusters of nodes in a distributed system may act semi-independently in handling a plurality of workload requests. In such an implementation, each cluster may have one or more shared data sources accessible to all nodes in the cluster.
Workload may be assigned to distributed system components via a load balancer (or hierarchy of load balancers), which relays requests to individual nodes or clusters. For some requests it may be desirable for a client-specific session history to be maintained by the distributed system. In such an application, a client and a node in the distributed system will typically interact several times, with a response from the node necessitating a subsequent request from a client, which in turn leads to another response from the node, and so on. For example, e-commerce may require that a server be aware of what financial information the client has already provided. This history may be tracked by providing information such as a session tracking number or session identifier (ID) to the client, often in the form of a cookie. This information is returned to the distributed system along with all future transaction requests from the client that are part of the session, so that the distributed system may use the session tracking number to look up previous transaction history and manage multiple concurrent client session histories.
One difficulty involved with managing session histories is that different nodes in different clusters may not have access to the same data sources, and thus, the same session histories. Alternatively, accessing data in other clusters or nodes may incur excess synchronization overhead or take much longer than accessing data local to a cluster or node. Because of this, load balancers may execute “sticky” load balancing, wherein a client request continuing a given session is sent to the same node that originated the session. Sticky load balancing generally involves a load balancer tracking the node currently handling a given session, often through a node identification number or node address associated with the session ID and/or bundled with the client requests.
A further difficulty with sticky load balancing may occur when the node handling a client session fails. The load balancer may send client requests for that session to another node in the system that does not have access to the client session history. This may lead to a timeout or communication error, since the new server would be unable to access the client's session history, which may in turn require a failure or restart of the session.