Today data centers consume large amounts of energy, and the trend is on the rise. In a data center, a load balancer may be placed in front of a group of servers (sometimes called a server cluster) or other data processing units. The load balancer may be responsible for distributing incoming requests to multiple servers. Traditional methods, such as random or round robin load dispatching, distribute the incoming requests to active servers in a dispersed fashion. The requests may seek data, in a database context for example, or might seek computing resources in a distributed computing context, for example. In doing so, for each individual processing unit, requests arrive frequently from the load balancer, even though statistically the unit may be idle for a significant amount of time. As a result, energy may still be consumed during times when a processing unit may appear to be otherwise idle. There is latency when entering and exiting a low power state, for example, since a low power state is not entered or exited instantaneously. During this latency period, no processing is being done by the processing unit, yet it is still consuming more power than it would consume in a strictly low power (or “sleep”) state. As a result, a processing unit may seldom have opportunities to go into a deep low power state. Thus data center power consumption may be excessive.
In order to achieve scalability and fast response times, normally a large number of processing units (sometimes called a cluster) may be needed to process the requests, e.g., HTTP requests. A load balancer may be located at the front end of the server cluster to perform load distribution among the active servers, for example. Load balancing may be used for many types of processing units, e.g., web servers, application servers, and database servers.
In a modern data center, not all processing units in a cluster may be active all of the time. More processing units may be brought up when incoming traffic becomes heavy; and some processing units will be shut down to save power when the traffic is light. For the active processing units, the target utilization is usually much less than 100% in order to provide good response times and reserve capacity room for sudden bursts of requests. With a target utilization of 60%, a processing unit will be idle for 40% of the time. However, how the idleness may be distributed over time has great impact on processing unit energy efficiency.
There are various schemes used to distribute the incoming requests among active processing units. Common ones include:
1) random dispatching, where requests are assigned to active processing units selected randomly by load balancer. On average, the load is evenly distributed among processing units. This scheme can be implemented by various hash functions;
2) round-robin dispatching, where the load balancer assigns the requests to active processing units on a rotating basis. This keeps the processing units equally assigned; and
3) weighted round-robin dispatching, where a weight is assigned to each processing unit in the group so higher capacity processing units service more requests. For example, the load balancer may assign two requests to a faster processing unit for each request assigned to a slower one.
The common drawback of these or related schemes, in term of energy efficiency, is that the load balancer distributes the incoming requests to active processing units in a dispersed fashion. In doing so, each individual processing unit, even though statistically idle for a significant amount of time, must handle requests that arrive frequently from the load balancer. As a result, the processing unit seldom has opportunities to go into a deep low power state. The frequent wake-ups keep the processing unit from ever going into a deep sleep state.
In the drawings, the leftmost digit(s) of a reference number identifies the drawing in which the reference number first appears.