More and more today computer end users are reaching out over the Internet to gather information and news located at remote servers. Often, in order to meet user demand, the requested information resides on multiple servers working in concert to fulfill information requests. Allowing multiple users to access the same data servers and execute the same application requires sophisticated network management capable of ensuring that servers are reliable, highly available and scalable. One of the more challenging aspects of network management is balancing server load in order to handle overwhelming demand for access to Internet locales.
“Load balancing” is the term given to a technique for apportioning the work of serving a network task, function, application etc. among two or more servers (also referred to as “hosts”). According to the technique, a number of servers are grouped in a “cluster” such that client requests are distributed amongst the servers in the cluster ensuring that no one server becomes overloaded. For example, load balancing is especially important for networks where it is difficult to predict the number of requests that will be issued to any given server, such as a high-traffic Web site host.
One common approach to load balancing is referred to as the “round-robin” approach (e.g., as is used in round-robin domain name servers). Under this method, application requests are evenly distributed amongst servers in a cluster such that each server receives a share of the load. The round-robin approach, however, has limitations such as not taking into consideration the different performance characteristics of individual servers in the cluster and not determining whether the designated server is actually available. Consequently, it is possible to overload a slower server in the cluster or send a request to a server that is not available simply because the designated server is the next in line to receive a request.
Another approach to load balancing requires the use of dedicated hardware utilized solely for the purpose of load balancing. For example, dedicated computers executing only load-balancing applications are used to accept connections on behalf of all servers in a cluster, monitor the cluster and assign application requests to servers in the cluster on the basis of performance and availability. Another hardware example is the use of network switches to create a cluster of servers and to divide traffic amongst the available servers in the cluster. A dedicated hardware solution, however, is problematic because it presents a single point of failure for the system such that if the computer or switch fails, the cluster of servers also fails.
An alternative to dedicated hardware, and a solution to the overhead expenses and hardware failure, is software-based load balancing. An example of a software-based solution is the MICROSOFT NETWORK LOAD BALANCING server, also referred to as the “NLB.” Microsoft's NLB is a symmetric, fully distributed algorithm that executes concurrently on each server in the cluster. The servers communicate with each other to monitor the availability of each other server and to determine mutually which server in the cluster handles the application request.
An example of an implementation of load balancing in the prior art is illustrated in FIG. 1 wherein load balancing is performed by network interface servers for two sets of requests (e.g., requests from clients on the Internet submitting requests to a set of published servers and the published servers returning responses). In this scenario, the motivation is to ensure that the client requests and server responses are handled by the same network interface server. As depicted in FIG. 1, networked computer system 100 includes one or more external client computers 110 connected via data links 115 and Internet 120 to a cluster of external network interface servers 130. The cluster of external network interface servers 130 is connected to a series of published servers 150 via data links 135 and 155 and a router 140. With continued reference to FIG. 1, when the external client 110, having IP Address A, makes a connection to one of the internal published servers 150, a data request message 117 is routed to server cluster 130, having IP Address B. Upon receipt, server cluster 130 executes a server selection algorithm based upon the source and destination IP addresses and then one of the servers in the cluster 130 accepts data request message 117. Following message path 1 in the example of FIG. 1, data request message 117 arrives at Server M as a result of executing the selection algorithm using IP Address A and IP Address B.
Server M then makes a connection to the appropriate published server 150 by translating the IP address of public server cluster 130 (i.e., IP Address B) to the private IP address of the published server. In this example, the IP address of Server M identified in data request message 137 translates to IP Address C. In this instance, data request message 137 follows message path 2 from Server M to Published Server N. When constructing a response message, Published Server N swaps the source and destination IP addresses in the response message. In the above example, the source IP address changes from IP Address A to IP Address C and the destination IP address changes from IP Address C to IP Address A. Thereafter, data response message 157 is routed back to server cluster 130, the predefined default gateway for published servers 150. Because the destination address of the response message is unknown to the published server, all response messages from published servers 150 are forwarded to an internal IP address for server cluster 130 used by data links 135.
Upon arrival of data response message 157, server cluster 130 executes a server selection algorithm based on the source and destination addresses. In this scenario, the response message may be sent to a server different than the server that processed the client data request 117 and initiated the connection with the published server. Following message path 3 in the example of FIG. 1, data response message 157 arrives at Server 2 as a result of executing the selection algorithm.
Under the above known load-balancing scheme, the server cluster determines which server processes the message by repeatedly executing the selection algorithm using the source and destination IP addresses. Thus, the return path through the external network interface is not ensured to be the same as the original path from the external client into the external network interface. Because the paths are not necessarily the same, the techniques presently available provide an insufficient means to load balance networked systems because they do not solve the problem of routing response messages back to the same server which is necessary for ISA and other applications that maintain state information for the client on server M.