More and more today computer end users are reaching out over the Internet to gather information and news located at remote servers. Often, in order to meet user demand, the requested information resides on multiple servers working in concert to fulfill information requests. Allowing multiple users to access the same data servers and execute the same application requires sophisticated network management capable of ensuring that servers are reliable, highly available and scalable. One of the more challenging aspects of network management is balancing server load in order to handle overwhelming demand for access to Internet locales.
“Load balancing” is the term given to a technique for apportioning the work of serving a network task, function, application etc. among two or more servers (also referred to as “hosts”). According to the technique, a number of servers are grouped in a “cluster” such that client requests are distributed amongst the servers in the cluster ensuring that no one server becomes overloaded. For example, load balancing is especially important for networks where it is difficult to predict the number of requests that will be issued to any given server, such as a high-traffic website host.
One common approach to load balancing is referred to as the “round-robin” approach. Under this method, application requests are evenly distributed amongst servers in a cluster such that each server gets an equal share of the load. The round-robin approach, however, has limitations such as not taking into consideration the different performance characteristics of individual servers in the cluster and not determining whether the designated server is actually available. Consequently, it is possible to overload a slower server in the cluster or send a request to a server that is not available.
Other approaches to load balancing require the use of dedicated hardware utilized solely for the purpose of load balancing. For example, dedicated computers executing only load-balancing applications are used to accept connections on behalf of all servers in a cluster, monitor the cluster and assign application requests to servers in the cluster on the basis of performance and availability. Another hardware example is the use of network switches to create a cluster of servers and to divide traffic amongst the available servers in the cluster. A dedicated hardware solution, however, is problematic because it presents a single point of failure for the system such that if the computer or switch fails, the cluster of servers also fails.
An alternative to dedicated hardware, and a solution to the overhead expenses and hardware failure, is software-based load balancing. An example of a software-based solution is the MICROSOFT NETWORK LOAD BALANCING server, also referred to as the “NLB.” Microsoft's NLB executes as a network driver on all servers in the cluster. The NLB drivers executing concurrently on each server communicate with each other to monitor the availability of each server and to determine mutually which server in the cluster handles the application request.
An example of a typical implementation of load balancing in the prior art is illustrated in FIG. 1. Networked computer system 100 includes one or more external client computers 110 connected via data links 115 and Internet 120 to a cluster of external network interface servers 130. The cluster of external network interface servers 130 is connected to a series of published servers 150 via data links 135 and 155 and a router 140. With continued reference to FIG. 1, when the external client 110, having IP Address A, makes a connection to one of the internal published servers 150, a data request message 117 is routed to server cluster 130, having IP Address B. Upon receipt, server cluster 130 executes a server selection algorithm based upon the source and destination IP addresses and then one of the servers in the cluster 130 accepts data request message 117. Following message path 1 in the example of FIG. 1, data request message 117 arrives at Server M as a result of executing the selection algorithm using IP Address A and IP Address B.
Server M then makes a connection to the appropriate published server 150 by translating the IP address of public Server M to the private IP address of the published server. In this example, the IP address of Server M identified in data request message 137 translates to IP Address C. In this instance, data request message 137 follows message path 2 from Server M to Published Server N. When constructing a response message, Published Server N swaps the source and destination IP addresses in the response message. In the above example, the source IP address changes from IP Address A to IP Address C and the destination IP address changes from IP Address C to IP Address A. Thereafter, data response message 157 is routed back to server cluster 130, the predefined default gateway for published servers 150. Because the destination address of the response message is unknown to the published server, all response messages from published servers 150 are forwarded to the MAC (i.e., Media Access Control) address of the predefined default gateway, which in this example is the MAC address of server cluster 130.
Upon arrival, server cluster 130 executes a server selection algorithm based on the source and destination addresses. In this scenario, the response message may be sent to a server different than the server that processed the client data request 117 and initiated the connection with the published server. Following message path 3 in the example of FIG. 1, data response message 157 arrives at Server 2 as a result of executing the selection algorithm.
Under the above known load-balancing scheme, the server cluster determines which server processes the message by repeatedly executing the selection algorithm using the source and destination IP addresses. Thus, the return path through the external network interface is not ensured to be the same as the original path from the external client into the external network interface.