Power efficiency is a key requirement across a broad range of systems, ranging from small portable devices, to rack-mounted processor farms. Even in systems where high performance is key, power efficiency is still a care-about. Power efficiency is determined both by hardware design and component choice, and software-based runtime power management techniques.
In wired systems power efficiency will typically enable a reduction in power supply capacity, as well as a reduction in cooling requirements and fan noise, and ultimately product cost. Power efficiency can allow an increase in component density as well. For example, a designer may be limited by the number of processors that can be placed on a board simply because the cumulative power consumption would exceed compliance limits for the bus specification. Increased component density can result either in increased capacity, a reduction in product size, or both.
In mobile devices, power efficiency means increased battery life, and a longer time between recharge. It also enables selection of smaller batteries, possibly a different battery technology, and a corresponding reduction in product size.
Power efficiency is a key product differentiator. A simple example is a buyer shopping for an MP3 player at an electronics store. In a side-by-side comparison of two players with the same features, the decision will likely go to the player with the longest time between recharge. In many scenarios, the success or failure of a product in its marketplace will be determined by its power efficiency.
The total power consumption of a CMOS circuit is the sum of both active and static power consumption: Ptotal=Pactive+Pstatic. Active power consumption occurs when the circuit is active, switching from one logic state to another. Active power consumption is caused both by switching current (that needed to charge internal nodes), and through current (that which flows when both P and N-channel transistors are both momentarily on). Active power consumption can be approximated by the equation: Ptransient=Cpd×F×Vcc2×Nsw, where Cpd is the dynamic capacitance, F is the switching frequency, Vcc, is the supply voltage, and Nsw, is the number of bits switching. An additional relationship is that voltage (Vcc) determines the maximum switching frequency (F) for stable operation. The important concepts here are: 1) the active power consumption is linearly related to switching frequency, and quadratically related to the supply voltage, and 2) the maximum switching frequency is determined by the supply voltage.
If an application can reduce the CPU clock rate and still meet its processing requirements, it can have a proportional savings in power dissipation. Due to the quadratic relationship, if the frequency can be reduced safely, and this frequency is compatible with a lower operating voltage available on the platform, then in addition to the savings due to the reduced clock frequency, a potentially significant additional savings can occur by reducing the voltage. However, it is important to recognize that for a given task set, reducing the CPU clock rate also proportionally extends the execution time of the same task set, requiring careful analysis of the application ensure that it still meets its real-time requirements. The potential savings provided by dynamic voltage and frequency scaling (DVFS) has been extensively studied in academic literature, with emphasis on ways to reduce the scaling latencies, improve the voltage scaling range, and schedule tasks so that real-time deadlines can still be met. For example, see Run-time Power Control Scheme Using Software Feedback Loop for Low-Power Real-time Applications, IEEE ISBN 0-7803-5974-7, Seongsoo Lee, Takayasu Sakurai, 2000; Intra-Task Voltage Scheduling for Low-Energy Hard Real-Time Applications, IEEE Design & Test of Computers, Dongkun Shin, Jihong Kim, Seongsoo Lee, 2001; and Run-time Voltage Hopping for Low-power Real-time Systems, DAC2000, ACM 1-58113-188-7, Seongsoo Lee, Takayasu Sakurai 2000.
Static power consumption is one component of the total power consumption equation. Static power consumption occurs even when the circuit is not switching, due to reverse-bias leakage. Traditionally, the static power consumption of a CMOS circuit has been very small in comparison to the active power consumption. Embedded applications will typically idle the CPU clock during inactivity to eliminate active power, which dramatically reduces total power consumption. However, new higher-performance transistors are bringing significant boosts in leakage currents, which requires new attention to the static power consumption component of the total power equation.
There are many known techniques utilized both in hardware design and at run-time to help reduce power dissipation. Table 1 lists some up-front hardware design decisions for reducing power dissipation. Table 2 lists common techniques employed at run-time to reduce power dissipation. Table 3 lists some fundamental challenges to utilizing these power management techniques in real-time systems.
TABLE 1DecisionDescriptionChoose a low-powerChoosing a power-efficient process (e.g., CMOS) is perhaps the most important up-technology basefront decision, and directly drives power efficiency.Partition separate voltageBy partitioning separate domains, different components can be wired to theand clock domainsappropriate power rail and clock line, eliminating the need for all circuitry to operateat the maximum required by any specific module.Enable scaling of voltageDesigning in programmable clock generators allows application code a linear savingsand frequencyin power when it can scale down the clock frequency. A programmable voltagesource allows the potential for an additional quadratic power savings when the voltagecan be reduced as well, because of reduced frequency. Also, designing the hardwareto minimize scaling latencies will enable broader usage of the scaling technique.Enable gating of differentSome static RAMs require less voltage in retention mode vs. normal operation mode.voltages to modulesBy designing in voltage gating circuitry, power consumption can be reduced duringinactivity while still retaining state.Utilize interrupts to alleviateOften software is required to poll an interface periodically to detect events. Forpolling by softwareexample, a keypad interface routine might need to spin or periodically wake to detectand resolve a keypad input. Designing the interface to generate an interrupt onkeypad input will not only simplify the software, but it will also enable event-drivenprocessing and activation of processor idle and sleep modes while waiting forinterrupts.Reduce loading of outputsDecreasing capacitive and DC loading on output pins will reduce total powerconsumption.Use hierarchical memoryDepending on the application, utilizing cache and instruction buffers can drasticallymodelreduce off-chip memory accesses and subsequent power draw.Boot with resources un-Many systems boot in a fully active state, meaning full power consumption. If certainpoweredsub-systems can be left un-powered on boot, and later turned on when really needed,it eliminates unnecessary wasted power.Minimize number of activeUsing shared clocks can reduce the number of active clock generators, and theirphase lock loops (PLL)corresponding power draw. For example, a processor's on-board PLL can bebypassed in favor of an external clock signal.Use clock dividers for fastA common barrier to highly dynamic frequency scaling is the latency of re-locking aselection of an alternatePLL on a frequency change. Adding a clock divider circuit at the output of the PLLfrequencywill allow instantaneous selection of a different clock frequency.
TABLE 2TechniqueDescriptionGate clocks off when notAs described above, active power dissipation in a CMOS circuit occurs only when theneededcircuit is clocked. By turning off clocks that are not needed, unnecessary activepower consumption is eliminated. Most processors incorporate a mechanism totemporarily suspend active power consumption in the CPU while waiting for anexternal event. This idling of the CPU clock is typically triggered via a ‘halt’ or ‘idle’instruction, called during application or OS idle time. Some processors partitionmultiple clock domains, which can be individually idled to suspend active powerconsumption in unused modules. For example, in the Texas InstrumentsTMS320C5510 DSP, six separate clock domains, CPU, cache, DMA, peripheralclocks, clock generator, and external memory interface, can be selectively idled.Activate peripheral low-Some peripherals have built-in low power modes that can be activated when thepower modesperipheral is not immediately needed. For example, a device driver managing a codecover a serial port can command the codec to a low power mode when there is noaudio to be played, or if the whole system is being transitioned to a low-power mode.Leverage peripheralSome peripherals have built-in activity detectors that can be programmed to poweractivity detectorsdown the peripheral after a period of inactivity. For example, a disk drive can beautomatically spun down when the drive is not being accessed, and spun back upwhen needed again.Utilize auto-refresh modesDynamic memories and displays will typically have a self or auto-refresh mode wherethe device will efficiently manage the refresh operation on its own.On boot actively turn offProcessors typically boot up fully powered, at a maximum clock rate, ready to doun-necessary powerwork. There will inevitably be resources powered that are not needed yet, or that mayconsumersnever be used in the course of the application. At boot time, the application or OSmay traverse the system, turning off/idling unnecessary power consumers.Gate power to subsystemsA system may include a power-hungry module that need not be powered at all times.only as neededFor example, a mobile device may have a radio subsystem that only needs to be ONwhen in range of the device with which it communicates. By gating power OFF/ONon demand, unnecessary power dissipation can be avoided.Benchmark application toTypically, systems are designed with excess processing capacity built in, either forfind minimum requiredsafety purposes, or for future extensibility and upgrades. For the latter case, afrequency and voltagescommon development technique is to fully exercise and benchmark the applicationto determine excess capacity, and then ‘dial-down’ the operating frequency andvoltage to that which enables the application to fully meet its requirements, butminimizes excess capacity. Frequency and voltage are usually not changed atruntime, but are set at boot time, based upon the benchmarking activity.Adjust CPU frequency andAnother technique for addressing excess processing capacity is to periodicallyvoltage based upon grosssample CPU utilization at runtime, and then dynamically adjust the frequency andactivityvoltage based upon the empirical utilization of the processor. This “interval-basedscheduling” technique improves on the power-savings of the previous staticbenchmarking technique because it takes advantage of the dynamic variability ofthe application's processing needs.Dynamically schedule CPUThe “interval-based scheduling” technique enables dynamic adjustments tofrequency and voltage toprocessing capacity based upon history data, but typically does not do well atmatch predicted work loadanticipating the future needs of the application, and is therefore not acceptable forsystems with hard real-time deadlines. An alternate technique is to dynamicallyvary the CPU frequency and voltage based upon predicted workload. Usingdynamic, fine-grained comparison of work completed vs. the worst-case executiontime (WCET) and deadline of the next task, the CPU frequency and voltage can bedynamically tuned to the minimum required. This technique is most applicable tospecialized systems with data-dependent processing requirements that can beaccurately characterized. Inability to fully characterize an application usually limitsthe general applicability of this technique. Study of efficient and stable schedulingalgorithms in the presence of dynamic frequency and voltage scaling is a topic ofmuch on-going research.Optimize execution speed ofDevelopers often optimize their code for execution speed. However, in manycodesituations the speed may be good enough, and further optimizations are notconsidered. When considering power consumption, faster code will typically meanmore time for leveraging idle or sleep modes, or a greater reduction in the CPUfrequency requirements. In some situations, speed optimizations may actuallyincrease power consumption (e.g., more parallelism and subsequent circuit activity),but in others, there may be power savings.Use low-power codeDifferent processor instructions exercise different functional units and data paths,sequences and data patternsresulting in different power requirements. Additionally, because of data bus linecapacitances and the inter-signal capacitances between bus lines, the amount ofpower required is affected by the data patterns that are transferred over the databuses. And, the power requirements are affected by the signaling patterns chosen(1s vs. 0s) for external interfaces (e.g., serial ports). Analyzing the affects ofindividual instructions and data patterns is an extreme technique that is sometimesused to maximize power efficiency.Scale application and OSArchitecting application and OS code bases to be scalable can reduce memoryfootprint based uponrequirements and, therefore, the subsequent runtime power requirements. Forminimal requirementsexample, by simply placing individual functions or APIs into individual linkableobjects, the linker can link in only the code/data needed and avoid linking deadcode/data.Use code overlays to reduceFor some applications, dynamically overlaying code from non-volatile to fastfast memory requirementsmemory will reduce both the cost and power consumption of additional fastmemory.Tradeoff accuracy vs. powerAccepting less accuracy in some calculations can drastically reduce processingconsumptionrequirements. For example, certain signal processing applications can tolerate morenoise in the results, which enables reduced processing and reduced powerconsumption.Enter a reduced capabilityWhen there is a change in the capabilities of the power source, e.g., when goingmode on a power changefrom AC to battery power, a common technique is to enter a reduced capabilitymode with more aggressive runtime power management. A typical example is alaptop computer, where the OS is notified on a switch to battery power, andactivates a different power management policy, with a lower CPU clock rate, ashorter timeout before the screen blanks or the disk spins down, etc. The OS powerpolicy implements a tradeoff between responsiveness and extending battery life. Asimilar technique can be employed in battery-only systems, where a battery monitordetects reduced capacity, and activates more aggressive power management, suchas slowing down the CPU, not enabling image viewing on the digital camera's LCDdisplay, etc.
TABLE 3ChallengeDescriptionScaling CPU frequency withFor many processors the same clock that feeds the CPU also feeds on-chipworkload often affectsperipherals, so scaling the clock based upon CPU workload can have side-affectsperipheralson peripheral operation. The peripherals may need to be reprogrammed beforeand/or after the scaling operation, and this may be difficult if a pre-existing (nonpower-aware) device driver is being used to manage the peripheral. Additionally,if the scaling operation affects the timer generating the OS system tick, this timerwill need to be adapted to follow the scaling operation, which will affect theabsolute accuracy of the time base.V/F scaling latencies can beThe latency for voltage and frequency scaling operations will vary widely acrosslarge, and platform-platforms. An application that runs fine on one platform may not be portable todependentanother platform, and may not run on a revision to the same platform if thelatencies change much. For example, the time for a down-voltage scalingoperation is typically load-dependent, and if the load changes significantly on therevised platform the application may not run correctly.Might not have stableSome processor vendors specify a non-operation sequence during voltage oroperation during V/F scalingclock frequency changes to avoid instabilities during the transition. In thesesituations, the scaling code will need to wait for the transition to occur beforereturning, increasing the scaling latency.V/F scaling directly affectsChanging CPU frequency (and voltage when possible) will alter the executionability to meet deadlinestime of a given task, potentially causing the task to miss a real-time deadline.Even if the new frequency is compatible with the deadline, there may still be aproblem if the latency to switch between V/F setpoints is too big.Scaling the CPU clock canIf the clock that feeds the CPU also feeds the OS timer, the OS timer will beaffect ability to measure CPUscaled along with the CPU, which compromises measurement of CPU utilization.utilizationWatchdogs still need to beWatchdog timers are used to detect abnormal program behavior and eitherkept happyshutdown or reboot a system. Typically the watchdog needs to be serviced withina pre-defined time interval to keep it from triggering. Power managementtechniques that slow down or suspend processing can therefore inadvertentlytrigger application failure.Idle and sleep modesDepending upon the processor and the debug tools, invoking idle and sleep modestypically collide withcan disrupt the transport of real-time instrumentation and debugging informationemulation, debug, andfrom the target. In the worst case it may perturb and even crash the debuginstrumentationenvironment. Similar concerns arise with V/F scaling, which may cause difficultyfor the emulation and debug circuitry. It may be the case that power management isenabled when the system is deployed, but only minimally used during development.Context save/restore canIn a non-power managed environment the OS or application framework willbecome non-trivialtypically save and restore register values during a context switch. As registerbanks, memories, and other modules are powered OFF and back ON, the context tobe saved and restored can grow dramatically. Also, if a module is powered down itmay be difficult (and sometimes not possible) to fully restore the internal state ofthe module.Most advanced powerMany of the research papers that demonstrate significant power savings use highlymanagement techniques arespecialized application examples, and do not map well to general application cases.still in the research stageOr, they make assumptions regarding the ability to fully characterize an applicationsuch that it can be guaranteed to be schedulable. These techniques often do notmap to ‘real world’, multi-function programmable systems, and more research isneeded for broader applicability.Different types ofDifferent hardware platforms have varying levels of support for the above listedapplications call fortechniques. Also, different applications running on the same platform may havedifferent techniquesdifferent processing requirements. For some applications, only the low-latencytechniques (e.g., clock idling) are applicable, but for others the higher-latencytechniques can be used to provide significant power savings when the applicationswitches between modes with significantly different processing requirements. Forexample, one mode can be run at low V/F, and another mode, with higherprocessing requirements, can be run at a higher V/F. If the V/F latency iscompatible with the mode switch time, the application can use the technique.