Power efficiency is a key requirement across a broad range of systems, ranging from small portable devices, to rack-mounted processor farms. Even in systems where high performance is key, power efficiency is still a care-about. Power efficiency is determined both by hardware design and component choice, and software-based runtime power management techniques.
In wired systems power efficiency will typically enable a reduction in power supply capacity, as well as a reduction in cooling requirements and fan noise, and ultimately product cost. Power efficiency can allow an increase in component density as well. For example, a designer may be limited by the number of processors that can be placed on a board simply because the cumulative power consumption would exceed compliance limits for the bus specification. Increased component density can result either in increased capacity, a reduction in product size, or both.
In mobile devices, power efficiency means increased battery life, and a longer time between recharge. It also enables selection of smaller batteries, possibly a different battery technology, and a corresponding reduction in product size.
Power efficiency is a key product differentiator. A simple example is a buyer shopping for an MP3 player at an electronics store. In a side-by-side comparison of two players with the same features, the decision will likely go to the player with the longest time between recharge. In many scenarios, the success or failure of a product in its marketplace will be determined by its power efficiency.
The total power consumption of a CMOS circuit is the sum of both active and static power consumption: Ptotal=Pactive+Pstatic. Active power consumption occurs when the circuit is active, switching from one logic state to another. Active power consumption is caused both by switching current (that needed to charge internal nodes), and through current (that which flows when both P and N-channel transistors are both momentarily on). Active power consumption can be approximated by the equation: Ptransient=Cpd×F×Vcc2×Nsw, where Cpd is the dynamic capacitance, F is the switching frequency, Vcc is the supply voltage, and Nsw is the number of bits switching. An additional relationship is that voltage (Vcc) determines the maximum switching frequency (F) for stable operation. The important concepts here are: 1) the active power consumption is linearly related to switching frequency, and quadratically related to the supply voltage, and 2) the maximum switching frequency is determined by the supply voltage.
If an application can reduce the CPU clock rate and still meet its processing requirements, it can have a proportional savings in power dissipation. Due to the quadratic relationship, if the frequency can be reduced safely, and this frequency is compatible with a lower operating voltage available on the platform, then in addition to the savings due to the reduced clock frequency, a potentially significant additional savings can occur by reducing the voltage. However, it is important to recognize that for a given task set, reducing the CPU clock rate also proportionally extends the execution time of the same task set, requiring careful analysis of the application ensure that it still meets its real-time requirements. The potential savings provided by dynamic voltage and frequency scaling (DVFS) has been extensively studied in academic literature, with emphasis on ways to reduce the scaling latencies, improve the voltage scaling range, and schedule tasks so that real-time deadlines can still be met. For example, see Run-time Power Control Scheme Using Software Feedback Loop for Low-Power Real-time Applications, IEEE ISBN 0-7803-5974-7, Seongsoo Lee, Takayasu Sakurai, 2000; Intra-Task Voltage Scheduling for Low-Energy Hard Real-Time Applications, IEEE Design & Test of Computers, Dongkun Shin, Jihong Kim, Seongsoo Lee, 2001; and Run-time Voltage Hopping for Low-power Real-time Systems, DAC2000, ACM 1-58113-188-7, Seongsoo Lee, Takayasu Sakurai 2000.
Static power consumption is one component of the total power consumption equation. Static power consumption occurs even when the circuit is not switching, due to reverse-bias leakage. Traditionally, the static power consumption of a CMOS circuit has been very small in comparison to the active power consumption. Embedded applications will typically idle the CPU clock during inactivity to eliminate active power, which dramatically reduces total power consumption. However, new higher-performance transistors are bringing significant boosts in leakage currents, which requires new attention to the static power consumption component of the total power equation.
There are many known techniques utilized both in hardware design and at run-time to help reduce power dissipation. Table 1 lists some up-front hardware design decisions for reducing power dissipation. Table 2 lists common techniques employed at run-time to reduce power dissipation. Table 3 lists some fundamental challenges to utilizing these power management techniques in real-time systems.
TABLE 1DecisionDescriptionChoose a low-powerChoosing a power-efficient process (e.g., CMOS) is perhaps the most importanttechnology baseup-front decision, and directly drives power efficiency.Partition separate voltage andBy partitioning separate domains, different components can be wired to theclock domainsappropriate power rail and clock line, eliminating the need for all circuitry tooperate at the maximum required by any specific module.Enable scaling of voltage andDesigning in programmable clock generators allows application code a linearfrequencysavings in power when it can scale down the clock frequency. A programmablevoltage source allows the potential for an additional quadratic power savings whenthe voltage can be reduced as well, because of reduced frequency. Also, designingthe hardware to minimize scaling latencies will enable broader usage of the scalingtechnique.Enable gating of differentSome static RAMs require less voltage in retention mode vs. normal operationvoltages to modulesmode. By designing in voltage gating circuitry, power consumption can bereduced during inactivity while still retaining state.Utilize interrupts to alleviateOften software is required to poll an interface periodically to detect events. Forpolling by softwareexample, a keypad interface routine might need to spin or periodically wake todetect and resolve a keypad input. Designing the interface to generate an interrupton keypad input will not only simplify the software, but it will also enable event-driven processing and activation of processor idle and sleep modes while waitingfor interrupts.Reduce loading of outputsDecreasing capacitive and DC loading on output pins will reduce total powerconsumption.Use hierarchical memoryDepending on the application, utilizing cache and instruction buffers canmodeldrastically reduce off-chip memory accesses and subsequent power draw.Boot with resources un-Many systems boot in a fully active state, meaning full power consumption. Ifpoweredcertain sub-systems can be left un-powered on boot, and later turned on whenreally needed, it eliminates unnecessary wasted power.Minimize number of activeUsing shared clocks can reduce the number of active clock generators, and theirphase lock loops (PLL)corresponding power draw. For example, a processor's on-board PLL can bebypassed in favor of an external clock signal.Use clock dividers for fastA common barrier to highly dynamic frequency scaling is the latency of re-lockingselection of an alternatea PLL on a frequency change. Adding a clock divider circuit at the output of thefrequencyPLL will allow instantaneous selection of a different clock frequency.
TABLE 2TechniqueDescriptionGate clocks off when notAs described above, active power dissipation in a CMOS circuit occurs only whenneededthe circuit is clocked. By turning off clocks that are not needed, unnecessaryactive power consumption is eliminated. Most processors incorporate a mechanismto temporarily suspend active power consumption in the CPU while waiting for anexternal event. This idling of the CPU clock is typically triggered via a ‘halt’ or‘idle’ instruction, called during application or OS idle time. Some processorspartition multiple clock domains, which can be individually idled to suspend activepower consumption in unused modules. For example, in the Texas InstrumentsTMS320C5510 DSP, six separate clock domains, CPU, cache, DMA, peripheralclocks, clock generator, and external memory interface, can be selectively idled.Activate peripheral low-Some peripherals have built-in low power modes that can be activated when thepower modesperipheral is not immediately needed. For example, a device driver managing acodec over a serial port can command the codec to a low power mode when thereis no audio to be played, or if the whole system is being transitioned to a low-power mode.Leverage peripheral activitySome peripherals have built-in activity detectors that can be programmed to powerdetectorsdown the peripheral after a period of inactivity. For example, a disk drive can beautomatically spun down when the drive is not being accessed, and spun back upwhen needed again.Utilize auto-refresh modesDynamic memories and displays will typically have a self or auto-refresh modewhere the device will efficiently manage the refresh operation on its own.On boot actively turn offProcessors typically boot up fully powered, at a maximum clock rate, ready to dounnecessarywork. There will inevitably be resources powered that are not needed yet, or thatpower consumersmay never be used in the course of the application. At boot time, the applicationor OS may traverse the system, turning off/idling unnecessary power consumers.Gate power to subsystemsA system may include a power-hungry module that need not be powered at allonly as neededtimes. For example, a mobile device may have a radio subsystem that only needsto be ON when in range of the device with which it communicates. By gatingpower OFF/ON on demand, unnecessary power dissipation can be avoided.Benchmark application toTypically, systems are designed with excess processing capacity built in, eitherfind minimum requiredfor safety purposes, or for future extensibility and upgrades. For the latter case, afrequency and voltagescommon development technique is to fully exercise and benchmark theapplication to determine excess capacity, and then ‘dial-down’ the operatingfrequency and voltage to that which enables the application to fully meet itsrequirements, but minimizes excess capacity. Frequency and voltage are usuallynot changed at runtime, but are set at boot time, based upon the benchmarkingactivity.Adjust CPU frequency andAnother technique for addressing excess processing capacity is to periodicallyvoltage based upon grosssample CPU utilization at runtime, and then dynamically adjust the frequency andactivityvoltage based upon the empirical utilization of the processor. This “interval-based scheduling” technique improves on the power-savings of the previous staticbenchmarking technique because it takes advantage of the dynamic variability ofthe application's processing needs.Dynamically schedule CPUThe “interval-based scheduling” technique enables dynamic adjustments tofrequency and voltage toprocessing capacity based upon history data, but typically does not do well atmatch predicted work loadanticipating the future needs of the application, and is therefore not acceptable forsystems with hard real-time deadlines. An alternate technique is to dynamicallyvary the CPU frequency and voltage based upon predicted workload. Usingdynamic, fine-grained comparison of work completed vs. the worst-caseexecution time (WCET) and deadline of the next task, the CPU frequency andvoltage can be dynamically tuned to the minimum required. This technique ismost applicable to specialized systems with data-dependent processingrequirements that can be accurately characterized. Inability to fully characterizean application usually limits the general applicability of this technique. Study ofefficient and stable scheduling algorithms in the presence of dynamic frequencyand voltage scaling is a topic of much on-going research.Optimize execution speed ofDevelopers often optimize their code for execution speed. However, in manycodesituations the speed may be good enough, and further optimizations are notconsidered. When considering power consumption, faster code will typicallymean more time for leveraging idle or sleep modes, or a greater reduction in theCPU frequency requirements. In some situations, speed optimizations mayactually increase power consumption (e.g., more parallelism and subsequentcircuit activity), but in others, there may be power savings.Use low-power codeDifferent processor instructions exercise different functional units and data paths,sequences and data patternsresulting in different power requirements. Additionally, because of data bus linecapacitances and the inter-signal capacitances between bus lines, the amount ofpower required is affected by the data patterns that are transferred over the databuses. And, the power requirements are affected by the signaling patterns chosen(1 s vs. 0 s) for external interfaces (e.g., serial ports). Analyzing the affects ofindividual instructions and data patterns is an extreme technique that issometimes used to maximize power efficiency.Scale application and OSArchitecting application and OS code bases to be scalable can reduce memoryfootprint based upon minimalrequirements and, therefore, the subsequent runtime power requirements. Forrequirementsexample, by simply placing individual functions or APIs into individual linkableobjects, the linker can link in only the code/data needed and avoid linking deadcode/data.Use code overlays to reduceFor some applications, dynamically overlaying code from non-volatile to fastfast memory requirementsmemory will reduce both the cost and power consumption of additional fastmemory.Tradeoff accuracy vs. powerAccepting less accuracy in some calculations can drastically reduce processingconsumptionrequirements. For example, certain signal processing applications can toleratemore noise in the results, which enables reduced processing and reduced powerconsumption.Enter a reduced capabilityWhen there is a change in the capabilities of the power source, e.g., when goingmode on a power changefrom AC to battery power, a common technique is to enter a reduced capabilitymode with more aggressive runtime power management. A typical example is alaptop computer, where the OS is notified on a switch to battery power, andactivates a different power management policy, with a lower CPU clock rate, ashorter timeout before the screen blanks or the disk spins down, etc. The OSpower policy implements a tradeoff between responsiveness and extendingbattery life. A similar technique can be employed in battery-only systems, wherea battery monitor detects reduced capacity, and activates more aggressive powermanagement, such as slowing down the CPU, not enabling image viewing on thedigital camera's LCD display, etc.
TABLE 3ChallengeDescriptionScaling CPU frequency withFor many processors the same clock that feeds the CPU also feeds on-chipworkload often affectsperipherals, so scaling the clock based upon CPU workload can have side-affectsperipheralson peripheral operation. The peripherals may need to be reprogrammed beforeand/or after the scaling operation, and this may be difficult if a pre-existing (nonpower-aware) device driver is being used to manage the peripheral. Additionally,if the scaling operation affects the timer generating the OS system tick, this timerwill need to be adapted to follow the scaling operation, which will affect theabsolute accuracy of the time base.V/F scaling latencies can beThe latency for voltage and frequency scaling operations will vary widely acrosslarge, and platform-platforms. An application that runs fine on one platform may not be portable todependentanother platform, and may not run on a revision to the same platform if thelatencies change much. For example, the time for a down-voltage scalingoperation is typically load-dependent, and if the load changes significantly on therevised platform the application may not run correctly.Might not have stableSome processor vendors specify a non-operation sequence during voltage or clockoperation during V/F scalingfrequency changes to avoid instabilities during the transition. In these situations,the scaling code will need to wait for the transition to occur before returning,increasing the scaling latency.V/F scaling directly affectsChanging CPU frequency (and voltage when possible) will alter the execution timeability to meet deadlinesof a given task, potentially causing the task to miss a real-time deadline. Even ifthe new frequency is compatible with the deadline, there may still be a problem ifthe latency to switch between V/F setpoints is too big.Scaling the CPU clock canIf the clock that feeds the CPU also feeds the OS timer, the OS timer will beaffect ability to measure CPUscaled along with the CPU, which compromises measurement of CPU utilization.utilizationWatchdogs still need to beWatchdog timers are used to detect abnormal program behavior and eitherkept happyshutdown or reboot a system. Typically the watchdog needs to be serviced withina pre-defined time interval to keep it from triggering. Power managementtechniques that slow down or suspend processing can therefore inadvertentlytrigger application failure.Idle and sleep modesDepending upon the processor and the debug tools, invoking idle and sleep modestypically collide withcan disrupt the transport of real-time instrumentation and debugging informationemulation, debug, andfrom the target. In the worst case it may perturb and even crash the debuginstrumentationenvironment. Similar concerns arise with V/F scaling, which may cause difficultyfor the emulation and debug circuitry. It may be the case that power managementis enabled when the system is deployed, but only minimally used duringdevelopment.Context save/restore canIn a non-power managed environment the OS or application framework willbecome non-trivialtypically save and restore register values during a context switch. As registerbanks, memories, and other modules are powered OFF and back ON, the contextto be saved and restored can grow dramatically. Also, if a module is powereddown it may be difficult (and sometimes not possible) to fully restore the internalstate of the module.Most advanced powerMany of the research papers that demonstrate significant power savings use highlymanagement techniques arespecialized application examples, and do not map well to general applicationstill in the research stagecases. Or, they make assumptions regarding the ability to fully characterize anapplication such that it can be guaranteed to be schedulable. These techniquesoften do not map to ‘real world’, multi-function programmable systems, and moreresearch is needed for broader applicability.Different types ofDifferent hardware platforms have varying levels of support for the above listedapplications call for differenttechniques. Also, different applications running on the same platform may havetechniquesdifferent processing requirements. For some applications, only the low-latencytechniques (e.g., clock idling) are applicable, but for others the higher-latencytechniques can be used to provide significant power savings when the applicationswitches between modes with significantly different processing requirements. Forexample, one mode can be run at low V/F, and another mode, with higherprocessing requirements, can be run at a higher V/F. If the V/F latency iscompatible with the mode switch time, the application can use the technique.