The ubiquity of computers in business, government, and private homes has resulted in availability of massive amounts of information from network-connected sources, such as data stores accessible through communication networks, such as the Internet. In recent years, computer communication and search tools have become widely available to facilitate the location and availability of information to users. Most computer communication and search tools implement a client-server architecture where a user client computer communicates with a remote server computer over a communication network. In order to achieve better system performance and throughput in the client-server architecture, large communication network bandwidths are needed as the number of client computers communicating with server computers increases.
One approach to increasing communication bandwidths relates to employing multiple networked server computers offering the same services. These server computers may be arranged in server farms, in which a single server from the server farm receives and processes a particular request from a client computer. Typically, server farms implement some type of load balancing algorithm to distribute requests from client computers among the multiple servers. Generally described, in a typical client-server computing environment, client devices generally issue requests to server devices for some kind of service and/or processing and the server devices process those requests and return suitable results to the client devices. In an environment where multiple clients send requests to multiple servers, workload distribution among the servers significantly affects the quality of service that the client devices receive from the servers. In many modern client-server environments, client devices number in the hundreds of thousands or millions, while the servers number in the hundreds or thousands. In such environments server load balancing becomes particularly important to system performance.
One approach to increase the effectiveness of load balancing and the resulting system performance and throughput, is to efficiently find the servers which have lower load levels than other servers, and assign new client requests to these servers. Finding and distributing workload to overloaded and under-utilized servers may be done in a central or a distributed manner. Central control of load balancing requires a dedicated controller, such as a master server, to keep track of all servers and their respective loads at all times, incurring certain administrative costs associated with keeping lists of servers and connections up-to-date. Additionally, such a master server constitutes a single point of failure in the system, requiring multiple mirrored master servers for more reliable operation. Still further, the reliability and scalability of the number of servers in the server farm can be dependent on the ability and efficiency of the dedicated controller to handle the increased number of servers.
Other approaches to finding and distributing workloads in a multi-server environment exist that relate to distributed, software-based approaches in which the client computers implement some type of load balancing software components. In one such approach, the client computer randomly selects a server. For example, a pseudo-random number generator may be utilized to select one of N servers. However, random selection of servers does not take the actual server loads into consideration and, thus, cannot avoid occasionally loading a particular server. Random server selection algorithms improve the average performance for request handling. This means such algorithms improve request handling for about 50% of the requests, but not for the majority of the requests. In another approach, the client computing device can implement a weighted probability selection algorithm in which the selection of a server is determined, at least in part, on the reported load/resources of each server. This approach must contend with the problem of information distribution among client devices. That is, server load information must be updated periodically at each client device to make optimal server selection based on server loads. The server load may be indicated by a length of a request queue at each server, a request processing latency, or other similar indicators. In yet another approach, a round-robin algorithm for server assignment may be used where each request is sent to a next server according to a number indicated by a counter maintained at the client device. Although simple to implement, this approach does not distribute the load optimally among servers because the round-robin cycle in different clients could coincide, causing multiple clients to call the same server at the same time. In yet another approach, servers may be assigned to individual clients on a priority basis. For example, a client may be assigned a number of servers according to a prioritized list of servers where the client sends a request to the server with the highest priority first, and next re-sends the request to the server with the next highest priority, if needed, and so on. As noted above, each of these approaches for server load distribution suffer from a particular problem that make server selection and load distribution sub-optimal, causing low levels of performance.