Energy and power consumption are first-class design concerns for contemporary computing devices, from low-end embedded systems to high-end high-performance microprocessors. For embedded devices, the focus is on low energy consumption to increase battery time. For high-performance microprocessors, the goal is to maximize system performance within a given power budget.
Dynamic Voltage and Frequency Scaling (DVFS) is a well-known and effective technique for reducing power consumption and/or increasing performance in modern computing devices.
Dynamic voltage scaling is a power management technique where the supply voltage used by a computing device (for example by the central processing unit of a computer system, the main memory controller of a computer system, the central processing unit of a router or the central processing unit of a server system) is increased or decreased, depending upon circumstances. Decreasing the supply voltage may be done in order to conserve power, particularly for example in laptops and other mobile devices, where energy comes from a battery and thus is limited. Increasing the supply voltage may be done in order to allow an increase of frequency, thus increasing computing device performance, or to increase reliability.
Dynamic frequency scaling is another power conservation technique that works on the same principles as dynamic voltage scaling. It is a technique whereby the frequency of a computing device, for example the processor clock frequency or the memory controller clock frequency, can be automatically adjusted on-the-fly, either to conserve power and reduce the amount of heat generated by the computing device, or to increase performance. Dynamic frequency scaling is also commonly used in laptops and other mobile devices, where energy comes from a battery and thus is limited. Dynamic frequency downscaling reduces the number of instructions a computing device can execute in a given amount of time, thus reducing performance. Hence, it is generally used when the workload is not computing intensive. Dynamic frequency upscaling improves performance, and is often implemented in commercial high-end processors to improve performance within a maximum power budget.
DVFS lowers the supply voltage as well as the clock frequency of the computing device to reduce both dynamic and static power consumption. Because downscaling both voltage and frequency leads to a cubic reduction in dynamic power consumption (and at most linear reduction in performance), frequency and voltage are often downscaled simultaneously. DVFS is being used in commercial computing devices across the entire computing range. Both dynamic voltage scaling and dynamic frequency scaling of the computing device can be used to prevent computer system overheating, that can result in program or operating system crashes, and possibly hardware damage. Reducing the voltage supplied to the computing device below the manufacturer's recommended minimum setting can result in system instability. Hence there is a need to determine DVFS impact on computing device performance and energy consumption. The applicability of DVFS is not limited to reducing power and energy consumption. It is also effective at addressing timing errors due to process variability. Other applications of DVFS include amongst others lifetime reliability management (where a trade-off is made between supply voltage and/or frequency on the one hand, and lifetime reliability on the other hand), and dynamic thermal management (where a trade-off is made between supply voltage and/or frequency on the one hand, and local heating in the processor on the other hand).
An important delimiter to DVFS is that there exists no accurate and practical way for estimating its impact on performance and energy consumption. Existing DVFS profitability estimation approaches can be categorized in three classes:
One approach for estimating the performance and energy impact of DVFS is proportional scaling, i.e., performance is assumed to scale proportionally with clock frequency, and power consumption is assumed to scale quadratically with supply voltage and linearly with frequency. Proportional scaling may be accurate for compute-bound applications, but incurs (severe) errors for memory-bound applications because off-chip memory access latencies do not scale with computing device clock frequency.
Linear scaling states that performance is a linear function of clock frequency. The slope of this linear function depends on the application behavior. If the application is compute-bound, the slope will be proportional to clock frequency. If on the other hand, the application is memory-bound, the slope is (almost) flat, i.e., performance is barely affected by processor clock frequency. Although linear scaling yields accurate DVFS performance estimates for both compute-bound and memory-bound applications, it introduces runtime performance and/or energy overhead because it requires (at least) two samples at different V/f operating points for computing the linear slope.
Estimated linear scaling eliminates the runtime overhead in linear scaling by estimating the relationship between performance and clock frequency. Estimated linear scaling uses existing hardware performance counters to count the number of off-chip memory accesses, and derives an empirical model to estimate the linear slope as a function of the number of off-chip memory accesses. By counting the number of off-chip memory accesses, estimated linear scaling does not account for the impact of MLP (memory-level parallelism or multiple memory accesses overlapping in time) on the non-pipelined fraction of the execution time, since the method systematically overestimates the non-pipelined fraction by treating multiple time-overlapping off-chip memory accesses as individual time-separated memory accesses. Therefore, estimated linear scaling leads to inaccurate DVFS profitability estimates. A number of hardware performance monitors are known. Hardware performance monitoring is implemented in commercial computing devices. Improvements on existing performance monitors are proposed, for example to have one monitor count conditionally on another monitor overflowing, or to compute histograms using performance monitors. No hardware performance monitors exist which are specifically tied to DVFS.
US 2008/0201591 describes a system that uses DVFS in a run-time environment to reduce energy consumption while minimizing the effect on performance for multi-threaded applications. The technique uses (existing) hardware performance counters to count the number of execution cycles, number of retired micro-operations, and the number of stall cycles due to cache misses, page faults (TLB misses), full reorder buffer (ROB), full reservation station, or branch misprediction. At the end of a pre-configured fixed time interval, the counters are read and the system determines whether or not to scale to another V/f operating point.
US 2007/0168055 describes a system that dynamically adapts voltage and clock frequency to increase energy-efficiency. It does so by running the application at multiple clock frequencies to derive the performance sensitivity to frequency scaling (much like the linear scaling approach described above).
There is still a need for methods and devices that more accurately estimate profitability of Dynamic Voltage and Frequency Scaling (DVFS) in a computing device.