To simplify the following discussion, the present invention will be discussed in terms of Web Sites on the Internet. Web content hosting is an increasingly common practice on the Internet. In Web content hosting, a provider who has sufficient resources to host multiple web sites offers to store and provide Web access to documents for clients of institutions, companies and individuals who lack the resources, or the expertise to efficiently maintain a Web server with the necessary security, availability, and peak bandwidth to the Internet.
Typically, the provider utilizes a cluster of servers to increase the capacity and computing power available for servicing the hosted sites. The goal of the cluster design is to distribute the computing load across the servers in the cluster. The simplest form of distribution is to assign each new file request to a server using a “round robin” algorithm. This type of system basically guarantees that each server is equally likely to receive a request on average. Systems in which the next request is sent to the server having the lightest computational load have also been proposed. Ideally, a cluster of N web servers should be N times more powerful than one web server. Unfortunately, prior art clusters do not provide such ideal performance.
Web server performance greatly depends on efficient RAM usage. A web server typically provides files that are stored on disks. To increase the efficiency of the server, a disk cache is created in RAM. If the requested file is already in the disk cache, it can be provided quickly. If a cache miss occurs, the response time is degraded by more than a factor of 10.
Unfortunately, load balancing for a cluster of web servers interferes with efficient RAM usage for the cluster. If new file requests are distributed equally across the servers, the popular files tend to occupy RAM space in all the server nodes. This redundant replication of “hot” content throughout the RAM of all the nodes leaves much less available RAM space for the rest of the content. As a result, cache misses become common for the remaining content and overall system performance is degraded.
To prevent unnecessary duplication of the hot content, some load balancing systems keep track of the server that last serviced a request for a particular file and route the request to that server provided its current workload is not too high. If the server is already overloaded, the load balancing system routes the request to another server that has a lower workload. Unfortunately, these systems depend on expensive special purpose hardware switches through which all of the traffic must pass. In addition to increasing the costs of the server clusters, these switches introduce a single point of failure that can bring down the entire server cluster.
Another approach to load balancing in a cluster, is to statically partition and assign the customers to the servers. For example, 100 customers could be partitioned as 10 customers per server in the configuration of 10 web servers. However, any such static partition can not take into account changing traffic patterns as well as the nature of changes in the content of the sites. As a result, such systems cannot adjust the partition to accommodate changing traffic and site dynamics.
Broadly, it is the object of the present invention to provide an improved load balancing system.
It is a further object of the present invention to provide a load balancing system that does not rely on hardware switches.
It is a still further object of the present invention to provide a load balancing system that can accommodate changing traffic patterns.
These and other objects of the present invention will become apparent to those skilled in the art from the following detailed description of the invention and the accompanying drawings.