1. Field of the Invention
This invention pertains generally to reducing power consumption in high-performance computing environments, and more particularly to a power-aware algorithm that automatically and transparently adapts microprocessor voltage and frequency settings to achieve significant power reduction and energy savings with minimal impact on performance.
2. Description of Related Art
Computing prowess continues to improve at the expense of higher power and energy consumption. Moore's Law of 1965 made the bold prediction that the number of transistors on a microprocessor would double every 18 months. However, with each doubling in the number of transistors comes a corresponding increase in power consumption. High power and energy consumption burdens the electrical supply load, increases operating costs, and has negative economic and environmental impact to society. More importantly, when the temperature in a computing system is high, system reliability and productivity deteriorate exponentially.
The reliability and productivity concerns are more critical in high-performance computing (HPC). For example, Table 1 shows the current reliability of leading-edge supercomputers. With power densities doubling every 18-24 months (FIG. 1) and large-scale HPC systems continuing to increase in size, the amount of heat generated (and hence, temperature) continues to rise. And as a rule of thumb, Arrhenius' equation, as applied to microelectronics, notes that for every 10° C. (18° F.) increase in temperature, the failure rate of a system doubles.
Informal empirical data taken from late 2000 to early 2002 supports Arrhenius' equation. In the winter, when the temperature inside a warehouse-based work environment was around 70° F.-75° F., the traditional cluster system failed approximately once a week; in the summer, when the temperature increased to 85° F.-90° F., the cluster failed twice a week.
Even more worrisome is how this computing environment affected the results of the Linpack benchmark running on a very dense, 18-node Beowulf cluster. After ten minutes of execution, the cluster produced an answer outside the residual (i.e., a silent error) when running in the dusty 85° F. warehouse but produced the correct answer when running in a 65° F. machine-cooled room. Clearly, the HPC community must worry about power and its effect on reliability.
Furthermore, every hour that an HPC system is unavailable translates to lost business or lost productivity. This issue is of extraordinary importance for companies that rely on parallel-computing resources for their business, as noted in Table 2.
In short, ignoring power consumption as a design constraint results in a system with high operational costs for power and cooling and can detrimentally impact reliability, which translates into lost productivity.
Dynamic voltage and frequency scaling (DVFS) is widely recognized as an effective way to reduce high power consumption in microprocessors (CPUs). Examples of DVFS mechanisms include powerNow! (AMD) and SpeedStep (Intel). DVFS exports several frequency-voltage settings and each CPU runs at a particular setting at any given time. The many settings provide various power-performance tradeoffs: the faster a CPU runs, the more power it consumes; conversely, the slower a CPU runs, the less power it consumes. DVFS allows a CPU to switch between different frequency-voltage settings at run time under the control of software.
However, the power-performance tradeoffs provided by the DVFS mechanism should be used judiciously. A computer user is not usually willing to sacrifice performance in exchange for lower power consumption. Thus, one goal for power management methodology via DVFS is to create a schedule of the use of CPU frequency-voltage settings over time so as to reduce CPU power consumption while minimizing performance degradation. A DVFS scheduling algorithm (referred hereinafter as a “DVFS algorithm”) needs to determine when to adjust the current frequency-voltage setting (i.e., scaling point) and to which new frequency-voltage setting (i.e., scaling factor) the system is adjusted. For example, a DVFS algorithm may set the scaling points at the beginning of each fixed-length time interval (say, every 10 ms) and determine the scaling factors by predicting the upcoming CPU workload based on the past history.
Existing DVFS algorithms possess a number of drawbacks. For example, many DVFS algorithms are based only on CPU utilization, That is, if a computer user is reading a document for an extended period of time, a DVFS algorithm would automatically scale down the frequency and supply voltage of the CPU in order to reduce power consumption. While this strategy is ideal for the interactive use of the computer, many computer systems spent a significant portion of a day in non-interactive use and with full CPU utilization. Given that the power consumption is proportional to the work being processed by a CPU, it is this full utilization of CPU that consumes the most energy and causes the temperature to rise up significantly. As a result, the power-aware algorithms that work well for the interactive use fail miserably with respect to HPC applications. In addition, DVFS algorithms based solely on CPU utilization only provide loose control over DVFS-induced performance slowdown. This is because the CPU utilization ratio by itself does not provide enough timing information.
A few other DVFS algorithms address the cases where the CPU is fully utilized, but their effectiveness falls short one way or the other. For example, many of them only target at the savings of a part of CPU power consumption. This overestimate of the power reduction achieved by DVFS encourages the DVFS algorithm to set the CPU to a low frequency-voltage setting. Because the CPU speed is running very slowly, other sources of power consumption remain switched “on” for too long and their energy consumption increases to a point that eliminates the power savings of DVFS.
Specifically, many DVFS algorithms use the equation P(f)=k·V2·f to model CPU power consumption where f is the frequency, V is the voltage and k is a constant. This model represents only part of the CPU power consumption. Current CPUs also consume power via leakage current. This type of power consumption increases and becomes critical as the processor enters into submicron scales and, therefore, cannot be ignored. In fact, power is consumed not only by CPUs, but also by other system components, such as storage media. Ignoring these other sources for power consumption will lead to the design of an over-optimistic DVFS algorithm which ends up with more overall energy consumption.
DVFS algorithms may also be too pessimistic and lose great opportunities in exploiting DVFS for maximum energy savings. Many existing DVFS algorithms assume that the performance of an application scales perfectly with respect to CPU speed, i.e., the system performance will become half if CPU speed is reduced to half. It is only in the worst case that the execution time doubles when the CPU speed is halved. Thus, a DVFS algorithm based on such a model will schedule a faster CPU speed and complete a task far ahead its deadline, whereas a slower CPU speed can be scheduled that still meets its performance deadline but consumes less power.
Specifically, many DVFS algorithms use the equation T(f)=W·(1/f) to model the execution time of a program, where T(f), in seconds, is the running time of a task at frequency f, and W, in cycles, is the amount of required CPU work. In practice, this model overly exaggerates the impact that CPU speed has on execution time, especially for applications that involve a lot of memory or disk accesses. In addition, W is not always a constant; for many programs, W is a function of CPU speed f. These two factors result in an underestimation of power savings that DVFS can bring for certain types of programs. Consequently, significantly more energy is consumed by the CPU.
Another drawback of existing DVFS algorithms is the assumption of a relationship between frequency f and voltage V in each setting:
      f    =                  K        ·                              (                          V              -                              V                T                                      )                    α                    V        ,where K, VT, and α are constants, 1≦α≦2, and VT<<V. Unfortunately, this relationship is not observed in real DVFS processors because current DVFS processors do not support continuously variable frequencies and voltages. For example, Intel's Pentium M® processors only support clock frequencies and voltages that are multiples of 100 MHz and 16 mV, respectively. In contrast, the relationship
  f  =            K      ·                        (                      V            -                          V              T                                )                α              V  can only be satisfied if continuously variable frequencies are supported. As a result, existing DVFS algorithms that have been proven to be optimal based on this particular frequency-voltage relationship may no longer be optimal (or even effective) if the relationship is not satisfied.
Finally, DVFS algorithms may not be real time based. A few DVFS algorithms rectify the aforementioned drawbacks at a price of becoming non-real-time. Non-real-time DVFS algorithms often involve profiling of the execution behavior of a program (or its structures) at all possible frequency-voltage settings, and then using the profile to select the lowest frequency that satisfies the performance constraint to execute the program. The problems with these DVFS approaches are threefold. First, they are all essentially profile-based and generally require the source code to be modified. As a result, these approaches are not completely transparent to the end user. Second, because the profile information can be influenced by program input, these approaches are input-dependent. Third, the instrumentation of source code may alter the instruction access pattern, and therefore, may produce profiles that are considerably different from the execution behavior of the original code. So, in theory, while these approaches might provide the maximum benefit relative to performance and power, they are of little use to end-user applications.