Appendix A, which is part of the present disclosure, is a listing in psuedocode of software code for embodiments of the present invention, which are described more completely below.
1. Field of the invention
The present invention relates generally to computer networking and, in particular, to a system to perform load balancing on multiple network servers.
2. Discussion of Related Art
Due to increasing traffic over computer networks such as the Internet, as well as corporate intranets, WANs and LANs, data providers must satisfy an increasing number of data requests. For example, a company that provides a search engine for the Internet may handle over a million hits (i.e. accesses to its web page) every day. A single server cannot handle such a large volume of data requests within an acceptable response time. Therefore, most high-volume information providers use multiple servers to satisfy the large number of data requests.
The prior art does not take into account packets per given time interval, and other measures of packet traffic to and from each replicated server. One reason this is not done is because it takes a great number of CPU cycles to compute this on a continual basis for each server that may be connected to a load balancer, which typically is based on a general purpose microprocessor driven by a software program. There is a need for a solution that overcomes this technical hurdle and incorporates packet traffic loads on the network interfaces belonging to the servers, as packet loss at the server network interface is a frequent reason for throughput and performance degradation.
FIG. 1 illustrates a typical arrangement of computers on a network 110. Network 110 represents any networking scheme such as the internet, a local ethernet, or a token ring network. Various clients such as a client 120 or a client 130 are coupled to network 110. Computer equipment from a data provider 160 is also coupled to network 110. For example, data provider 160 may use a bridge 161 to connect a local area network (LAN) 163 to network 110. Servers 162, 164, 166, and 168 are coupled to LAN 163.
Most data transfers are initiated by a client sending a data request. For example, client 120 may request a web page from data provider 160. Typically, on the internet, client 120 requests information by sending a data request to a name such as xe2x80x9cwww.companyname.comxe2x80x9d representing data provider 160. Through the use of the domain name server system of the internet, the name is converted into a IP address for data provider 160. Client 120 then sends a data request to the IP address. The data request contains the IP address of client 120 so that data provider 160 can send the requested data back to client 120. Data provider 160 converts the IP address into the IP address of server 162, server 164, server 166, or server 168. The data request is then routed to the selected server. The selected server then sends the requested data to client 120. For other networks, the specific mechanism for routing a data request by client 120 to data provider 160 can vary. However, in most cases data requests from client 120 contain the network address of client 120 so that data provider 160 can send the requested data to client 120.
Since each of the multiple servers contain the same information, each data request can be handled by any one of the servers. Furthermore, the use of multiple servers can be transparent to client 120. Thus the actual mechanism used to route the data request can be determined by data provider 160 without affecting client 120. To maximize the benefits of having multiple servers, data provider 160 should spread the data requests to the servers so that the load on the servers are roughly equal. Thus, most data providers use a load balancer to route the data requests to the servers. As shown in FIG. 2, conceptually a load balancer 210 receives multiple data requests 220 and routes individual data requests to server 162, 164, 166, or 168. In FIG. 2, four servers are shown for illustrative purposes only. The actual number of servers can vary. Multiple data requests 220, represent the stream of data requests from various clients on network 110.
Some conventional load balancing methods include: random assignment, modulo-S assignment where S represents the number of servers, and sequential assignment. In random assignment, load balancer 210 selects a server at random for each data request. In modulo-S assignment, each server is assigned a number from 0 to Sxe2x88x921 (where S is the number of servers). Load balancer 210 selects the server which corresponds to the address of the client making the data request modulo-S. In sequential assignment, each server is selected in turn for each new data request. Thus, if eight data requests are received, server 162 processes data requests one and five; server 164 processes data requests two and six; server 166 processes data requests three and seven; and server 168 processes data requests four and eight.
On the average over many data requests, each of the conventional load balancing methods should provide adequate load balancing. However, short term imbalances can result from a variety of sources. For example, if some data requests require more data than other data requests, all three methods may overload one server while other servers are idle. If the majority of data requests come from clients which are mapped to the same server using modulo-S assignment, one server becomes overloaded while others servers are underloaded. Thus, conventional load balancing methods do not provide balanced loads in many circumstances. Furthermore, conventional load balancing methods do not adapt to conditions of the servers. Hence there is a need for a load balancer which uses a dynamic load balancing method which can consistently balance the load on a plurality of servers.
The present invention includes methods and systems for consistently balancing the load on a plurality of servers. Specifically, the load balancer uses hashing to separate data requests from clients into a plurality of buckets. The buckets are dynamically assigned to the server having the lightest load as necessary.
The load balancer tracks the state of each server. A server is in the non-operational state if it is deemed to be unable to perform the service. The load balancer maintains a list of servers that are operational and assigns buckets only to servers that are operational.
The server fault tolerance mechanism in the load balancer detects when a server goes down and redistributes the load to the new set of operational servers. Additionally, when previously non-operational server becomes operational, the traffic is redistributed over the new set of operational servers. This redistribution is done in such a manner as not to disrupt any existing client-server connections.