1. Field of the Invention
The present invention relates generally to computer cluster load balancing systems and methods and, particularly, to a novel system and method for load balancing a farm of servers hosting clusters of web sites that enables maximal web site customer throughput.
2. Discussion of the Prior Art
In modern world-wide web-based (xe2x80x9cwebxe2x80x9d) computer systems, there currently exists the concept of a farm containing multiple web servers containing facilities for hosting multiple web sites. Sharing common resources such as web servers is very effective, because it is statistically unlikely that busy periods for one web site will correspond to those of another. However, the challenge remains to balance the load on the existing servers effectively, so that maximum customer throughput of the system may be achieved.
Overutilization of servers may cause either web site service interruptions to current customers or rejection of new customer demands, neither of which is desirable. On the other hand, underutilization is wasteful. Consequently, there is presented a real-time scheduling problem which is non-trivial and must be solved satisfactorily if one is to achieve the supposed advantages of a web farm. The server scheduling problem is made more complicated by the fact that some web sites are significantly more popular than others at any given time. Furthermore, this skewed distribution is not stationary: it varies significantly on a weekly, daily, hourly or even a more frequent basis, due to changing web site popularity and customer mix.
It is very likely that the popularity of the hottest web sites will be so great that it takes multiple servers to host them satisfactorily. Thus, it would be highly advantageous to exploit a requirement that multiple servers be available for hosting the highly popular web sites, particularly, by taking advantage of resulting multiple web site copies in order to solve the server load balancing problem very effectively.
The present invention pertains to an improved method for web site server load balancing by servers available for hosting the highly popular web sites, particularly, by taking advantage of multiple web site copies in order to solve the server load balancing problem very effectively.
According to the principles of the present invention, the load balancing method consists of two components: 1) a static component that functions to create the logical assignment of web sites to servers; and, 2) a dynamic component that performs real-time web site customer scheduling. The static component consists of two stages. First, based on web site demand forecasts, an optimization technique is employed for solving the xe2x80x9capportionment problemxe2x80x9d to determine the optimal number of copies per web site. This technique is borrowed from the theory of resource allocation problems, such as described in T. Ibaraki and N. Katoh, xe2x80x9cResource Allocation Problemsxe2x80x94Algorithmic Approaches,xe2x80x9d The MIT Press, 1988. Second, a method is implemented which makes good quality logical assignments of these optimal number of web site copies to servers. The set of all servers to which a particular web site is assigned is called its xe2x80x98clusterxe2x80x99. Note that these clusters may overlap, i.e., the set of web servers is not partitioned. This logical assignment method may be run either in initial or incremental mode. The initial mode is appropriate when configuring a new web cluster farm. Those web sites with multiple copies are assigned first, using a graph-theoretic scheme based on a construct called xe2x80x9cclique treesxe2x80x9d. Then single copy web sites are assigned, using a Least Loaded First (xe2x80x9cLLFxe2x80x9d) scheme. The incremental mode allows for constraints which limit the number of copy and logical assignment changes, and is thus practical and appropriate for maintaining high quality web site-to-server assignments. A k-interchange heuristic such as described in the reference to G. Nemhauser and L. Wolsey entitled xe2x80x9cInteger and Combinatorial Optimizationxe2x80x9d, John Wiley and Sons, 1988, the contents and disclosure of which is incorporated by reference as if fully set forth herein, may be employed. The incremental mode is preferably run periodically, e.g., once per week or once per month, etc. However, one could also run this mode when the cluster farm configuration changes, for example when new servers are added to the system. In any case, the exact frequency will depend on the volatility of the web site demand forecasts and the growth in customer activity.
The dynamic component handles the real-time scheduling of web sites customers to servers, based on the output of the static component and on fluctuating web site customer demands. These fluctuations occur because customers arrive and depart, and they do so in a fashion which is not entirely predictable. Thus, according to the invention, a probabilistic approach is aimed at assigning servers to newly arriving customers. A customer who is assigned initially to a particular server will be allowed to complete his activity at that server. However, it is possible that the server will allow existing activity for a particular web site to quiesce naturally, future activity for that web site being assigned to a new server. The actual output of the dynamic component is a set of probabilistic routing choices, one for each web site. Thus, associated with the web site is a set of optimal routing probabilities, one for each server in the cluster, summing to one. A routing probability of 0 indicates that the relevant server is not hosting the web site at the time, i.e., customer activity for that web site is being handled by other servers in the cluster.
Once these routing probabilities have been established the actual assignments of new customers to web sites may be handled in a greedy fashion. If the routing probabilities happen to be equal, for example, this amounts to round-robin scheduling. For a given web site, the active hosts consist of those servers whose routing probabilities are positive. The other servers are not active for this web site-thus, limiting the increase of the spread of active hosts more than necessary. In particular it will never happen that two servers are both simultaneously active for two distinct web sites.
As with the static component there are two stages to the dynamic component: a first stage implementing an optimization technique for solving the xe2x80x9cdiscrete class constrained resource allocation problemxe2x80x9d ; and, a second stage that attempts to achieve these load balancing goals by basing the scheduling decision on which server should handle a new customer web site request on the pre-existing probalistic basis.
For the first stage, optimization techniques for solving the discrete class constrained resource allocation problem may be implemented in accordance with techniques described in the references A. Federgruen and H. Groenevelt, xe2x80x9cThe Greedy Procedure for Resource Allocation Problems: Necessary and Sufficient Conditions for Optimalityxe2x80x9d, Operations Research, vol. 34, pp. 909-918, 1986 and, A. Tantawi, D. Towsley and J. Wolf, xe2x80x9cOptimal Allocation of Multiple Class Resources in Computer Systemsxe2x80x9d, ACM Sigmetrics Conference, Santa Fe NM, 1988, the contents and disclosure of each of which are incorporated by reference as if fully set forth herein. Particularly, these references describe techniques that may be used for determining optimal load balancing goals at any given moment, given the logical assignments of web sites to servers. Specifically, the output of this stage is the optimal number of web site customers on each server. This problem is repeatedly solved whenever the overall load on the servers changes significantly. Fortunately, the method is fast and incremental in nature.
According to the second stage for achieving load balancing goals, the scheduling decision on which server should handle a new customer web site request is performed on the pre-existing probabilistic basis: Specifically, those servers to which the web site is logically assigned and which already have current activity for that web site are examined, and servers are chosen amongst those servers greedily, according to their routing probabilities. If the routing probabilities are all equal, then the resulting round-robin policy amounts to assigning customers in cyclic fashion. This approach naturally tends to balance the load fairly well. However, periodically load balancing using the greedy probabilistic approach alone may degrade too much relative to the optimal goal. When the quality of the server load balancing differs from the goal by more than a predefined threshold, or perhaps when the actual performance at the various servers varies too far from optimally balanced, the real-time method is initiated. This method is also graph-theoretic, and has the effect of shifting load from relatively overloaded to relatively underloaded servers.
A typical but simplified example of the dynamic phase output will help to illustrate our technique: Consider a scenario in which the static component of the method has assigned web site A to servers 1 and 2, web site B to servers 2 and 3, web site C to servers 3 and 4, and a less popular web site D to server 1 alone. Suppose a significant amount of new customer for web site D arrives. The greedy probabilistic method for web site D is by necessity trivial. It must schedule all new customer traffic for web site D on server 1, thus increasing the load on that server. Suppose that this action overloads server 1 past the predefined threshold, relative to optimal. If server 4 is relatively underloaded, the method would probalistically assign a fixed amount of new traffic for web site A from server 1 to 2, the same amount of new traffic for web site B from server 2 to 3, and the same amount of new traffic for web site C from server 3 to 4. The actual amount of shifted new customer traffic is the minimum of five quantities. The first is the amount of existing traffic for web site A on server 1. The second is the amount of existing traffic for web site B on server 2. The third is the amount of existing traffic for web site C on server 3. The fourth is the amount of traffic necessary to reduce the load on server 1 to optimal. And the fifth is the maximum amount of new traffic necessary to increase the load on server 4 to optimal. The following directed graph
1xe2x80x94Axe2x86x92 greater than 2xe2x80x94Bxe2x86x92 greater than 3xe2x80x94Cxe2x86x92 greater than 4
represents this shift of customer load neatly, with the nodes corresponding to servers and the directed arcs corresponding to web sites. The effect of this three-step modification, in theory, is to lower the load on server 1 by some fixed amount, hopefully enough to yield acceptable traffic on that server, and to raise the load on server 4 by a similar amount. There should be essentially no net effect on servers 2 and 3. It should be understood that the actual traffic in the revised scenario are statistical in nature, and thus, may vary somewhat from what is predicted.