Power efficiency is a key requirement across a broad range of systems, ranging from small portable devices, to rack-mounted processor farms. Even in systems where high performance is key, power efficiency is still a care-about. Power efficiency is determined both by hardware design and component choice, and software-based runtime power management techniques.
In wired systems power efficiency will typically enable a reduction in power supply capacity, as well as a reduction in cooling requirements and fan noise, and ultimately product cost. Power efficiency can allow an increase in component density as well. For example, a designer may be limited by the number of processors that can be placed on a board simply because the cumulative power consumption would exceed compliance limits for the bus specification. Increased component density can result either in increased capacity, a reduction in product size, or both.
In mobile devices, power efficiency means increased battery life, and a longer time between recharge. It also enables selection of smaller batteries, possibly a different battery technology, and a corresponding reduction in product size.
Power efficiency is a key product differentiator. A simple example is a buyer shopping for an MP3 player at an electronics store. In a side-by-side comparison of two players with the same features, the decision will likely go to the player with the longest time between recharge. In many scenarios, the success or failure of a product in its marketplace will be determined by its power efficiency.
The total power consumption of a CMOS circuit is the sum of both active and static power consumption: Ptotal=Pactive+Pstatic. Active power consumption occurs when the circuit is active, switching from one logic state to another. Active power consumption is caused both by switching current (that needed to charge internal nodes), and through current (that which flows when both P and N-channel transistors are both momentarily on). Active power consumption can be approximated by the equation: Ptransient=Cpd×F×Vcc2×Nsw, where Cpd is the dynamic capacitance, F is the switching frequency, Vcc is the supply voltage, and Nsw is the number of bits switching. An additional relationship is that voltage (Vcc) determines the maximum switching frequency (F) for stable operation. The important concepts here are: 1) the active power consumption is linearly related to switching frequency, and quadratically related to the supply voltage, and 2) the maximum switching frequency is determined by the supply voltage.
If an application can reduce the CPU clock rate and still meet its processing requirements, it can have a proportional savings in power dissipation. Due to the quadratic relationship, if the frequency can be reduced safely, and this frequency is compatible with a lower operating voltage available on the platform, then in addition to the savings due to the reduced clock frequency, a potentially significant additional savings can occur by reducing the voltage. However, it is important to recognize that for a given task set, reducing the CPU clock rate also proportionally extends the execution time of the same task set, requiring careful analysis of the application ensure that it still meets its real-time requirements. The potential savings provided by dynamic voltage and frequency scaling (DVFS) has been extensively studied in academic literature, with emphasis on ways to reduce the scaling latencies, improve the voltage scaling range, and schedule tasks so that real-time deadlines can still be met. For example, see Run-time Power Control Scheme Using Software Feedback Loop for Low-Power Real-time Applications, IEEE ISBN 0-7803-5974-7, Seongsoo Lee, Takayasu Sakurai, 2000; Intra-Task Voltage Scheduling for Low-Energy Hard Real-Time Applications, IEEE Design & Test of Computers, Dongkun Shin, Jihong Kim, Seongsoo Lee, 2001; and Run-time Voltage Hopping for Low-power Real-time Systems, DAC2000, ACM 1-58113-188-7, Seongsoo Lee, Takayasu Sakurai 2000.
Static power consumption is one component of the total power consumption equation. Static power consumption occurs even when the circuit is not switching, due to reverse-bias leakage. Traditionally, the static power consumption of a CMOS circuit has been very small in comparison to the active power consumption. Embedded applications will typically idle the CPU clock during inactivity to eliminate active power, which dramatically reduces total power consumption. However, new higher-performance transistors are bringing significant boosts in leakage currents, which requires new attention to the static power consumption component of the total power equation.
There are many known techniques utilized both in hardware design and at run-time to help reduce power dissipation. Table 1 lists some up-front hardware design decisions for reducing power dissipation. Table 2 lists common techniques employed at run-time to reduce power dissipation. Table 3 lists some fundamental challenges to utilizing these power management techniques in real-time systems.
TABLE 1DecisionDescriptionChoose a low-powerChoosing a power-efficient process (e.g., CMOS)technology baseis perhaps the most important up-front decision,and directly drives power efficiency.Partition separateBy partitioning separate domains, differentvoltage and clockcomponents can be wired to the appropriatedomainspower rail and clock line, eliminating the need forall circuitry to operate at the maximum requiredby any specific module.Enable scaling ofDesigning in programmable clock generatorsvoltage and frequencyallows application code a linear savings in powerwhen it can scale down the clock frequency. Aprogrammable voltage source allows the potentialfor an additional quadratic power savings whenthe voltage can be reduced as well, because ofreduced frequency. Also, designing the hardwareto minimize scaling latencies will enable broaderusage of the scaling technique.Enable gating ofSome static RAMs require less voltage indifferent voltages toretention mode vs. normal operation mode. Bymodulesdesigning in voltage gating circuitry, powerconsumption can be reduced during inactivitywhile still retaining state.Utilize interrupts toOften software is required to poll an interfacealleviate polling byperiodically to detect events. For example, asoftwarekeypad interface routine might need to spin orperiodically wake to detect and resolve a keypadinput. Designing the interface to generate aninterrupt on keypad input will not only simplifythe software, but it will also enable event-drivenprocessing and activation of processor idle andsleep modes while waiting for interrupts.Reduce loading ofDecreasing capacitive and DC loading on outputoutputspins will reduce total power consumption.Use hierarchicalDepending on the application, utilizing cache andmemory modelinstruction buffers can drastically reduce off-chipmemory accesses and subsequent power draw.Boot with resourcesMany systems boot in a fully active state,un-poweredmeaning full power consumption. If certain sub-systems can be left un-powered on boot, and laterturned on when really needed, it eliminatesunnecessary wasted power.Minimize number ofUsing shared clocks can reduce the number ofactive phase lock loopsactive clock generators, and their corresponding(PLL)power draw. For example, a processor’s on-boardPLL can be bypassed in favor of an external clocksignal.Use clock dividers forA common barrier to highly dynamic frequencyfast selection of anscaling is the latency of re-locking a PLL on aalternate frequencyfrequency change. Adding a clock divider circuitat the output of the PLL will allow instantaneousselection of a different clock frequency.
TABLE 2TechniqueDescriptionGate clocks offAs described above, active power dissipation in awhen not neededCMOS circuit occurs only when the circuit isclocked. By turning off clocks that are not needed,unnecessary active power consumption is elim-inated. Most processors incorporate a mechanism totemporarily suspend active power consumption inthe CPU while waiting for an external event. Thisidling of the CPU clock is typically triggered via a‘halt’ or ‘idle’ instruction, called during applica-tion or OS idle time. Some processors partitionmultiple clock domains, which can be individuallyidled to suspend active power consumption in un-used modules. For example, in the TexasInstruments TMS320C5510 DSP, six separate clockdomains, CPU, cache, DMA, peripheral clocks,clock generator, and external memory interface, canbe selectively idled.Activate peripheralSome peripherals have built-in low power modeslow-power modesthat can be activated when the peripheral is notimmediately needed. For example, a device drivermanaging a codec over a serial port can commandthe codec to a low power mode when there is noaudio to be played, or if the whole system is beingtransitioned to a low-power mode.Leverage peripheralSome peripherals have built-in activity detectorsactivity detectorsthat can be programmed to power down the pe-ripheral after a period of inactivity. For example, adisk drive can be automatically spun down when thedrive is not being accessed, and spun back up whenneeded again.Utilize auto-refreshDynamic memories and displays will typically havemodesa self or auto-refresh mode where the device willefficiently manage the refresh operation on its own.On boot activelyProcessors typically boot up fully powered, at aturn off un-maximum clock rate, ready to do work. There willnecessary powerinevitably be resources powered that are not neededconsumersyet, or that may never be used in the course of theapplication. At boot time, the application or OS maytraverse the system, turning off/idling unnecessarypower consumers.Gate power toA system may include a power-hungry module thatsubsystems only asneed not be powered at all times. For example, aneededmobile device may have a radio subsystem that onlyneeds to be ON when in range of the device withwhich it communicates. By gating power OFF/ONon demand, unnecessary power dissipation can beavoided.BenchmarkTypically, systems are designed with excess pro-application to findcessing capacity built in, either for safety purposes,minimum requiredor for future extensibility and upgrades. For thefrequency andlatter case, a common development technique is tovoltagesfully exercise and benchmark the application todetermine excess capacity, and then ‘dial-down’ theoperating frequency and voltage to that whichenables the application to fully meet its require-ments, but minimizes excess capacity. Frequencyand voltage are usually not changed at runtime, butare set at boot time, based upon the benchmarkingactivity.Adjust CPUAnother technique for addressing excess processingfrequency andcapacity is to periodically sample CPU utilization atvoltage based uponruntime, and then dynamically adjust the frequencygross activityand voltage based upon the empirical utilization ofthe processor. This “interval-based scheduling”technique improves on the power-savings of theprevious static benchmarking technique because ittakes advantage of the dynamic variability of theapplication’s processing needs.DynamicallyThe “interval-based scheduling” technique enablesschedule CPUdynamic adjustments to processing capacity basedfrequency andupon history data, but typically does not do well atvoltage to matchanticipating the future needs of the application, andpredicted work loadis therefore not acceptable for systems with hardreal-time deadlines. An alternate technique is todynamically vary the CPU frequency and voltagebased upon predicted workload. Using dynamic,fine-grained comparison of work completed vs. theworst-case execution time (WCET) and deadline ofthe next task, the CPU frequency and voltage can bedynamically tuned to the minimum required. Thistechnique is most applicable to specialized systemswith data-dependent processing requirements thatcan be accurately characterized. Inability to fullycharacterize an application usually limits the generalapplicability of this technique. Study of efficient andstable scheduling algorithms in the presence of dy-namic frequency and voltage scaling is a topic ofmuch on-going research.Optimize executionDevelopers often optimize their code for executionspeed of codespeed. However, in many situations the speed maybe good enough, and further optimizations are notconsidered. When considering power consumption,faster code will typically mean more time for lever-aging idle or sleep modes, or a greater reduction inthe CPU frequency requirements. In some situations,speed optimizations may actually increase powerconsumption (e.g., more parallelism and subsequentcircuit activity), but in others, there may be powersavings.Use low-power codeDifferent processor instructions exercise differentsequences and datafunctional units and data paths, resulting in differentpatternspower requirements. Additionally, because of databus line capacitances and the inter-signal capaci-tances between bus lines, the amount of power re-quired is affected by the data patterns that are trans-ferred over the data buses. And, the power require-ments are affected by the signaling patterns chosen(1s vs. 0s) for external interfaces (e.g., serial ports).Analyzing the affects of individual instructions anddata patterns is an extreme technique that is some-times used to maximize power efficiency.Scale applicationArchitecting application and OS code bases to beand OS footprintscalable can reduce memory requirements and,based upon minimaltherefore, the subsequent runtime power require-requirementsments. For example, by simply placing individualfunctions or APIs into individual linkable objects,the linker can link in only the code/data needed andavoid linking dead code/data.Use code overlaysFor some applications, dynamically overlaying codeto reduce fastfrom non-volatile to fast memory will reduce bothmemorythe cost and power consumption of additional fastrequirementsmemory.Tradeoff accuracyAccepting less accuracy in some calculations canvs. powerdrastically reduce processing requirements. Forconsumptionexample, certain signal processing applications cantolerate more noise in the results, which enables re-duced processing and reduced power consumption.Enter a reducedWhen there is a change in the capabilities of thecapability mode onpower source, e.g., when going from AC to batterya power changepower, a common technique is to enter a reducedcapability mode with more aggressive runtimepower management. A typical example is a laptopcomputer, where the OS is notified on a switch tobattery power, and activates a different powermanagement policy, with a lower CPU clock rate, ashorter timeout before the screen blanks or the diskspins down, etc. The OS power policy implements atradeoff between responsiveness and extendingbattery life. A similar technique can be employed inbattery-only systems, where a battery monitordetects reduced capacity, and activates more aggres-sive power management, such as slowing down theCPU, not enabling image viewing on the digitalcamera’s LCD display, etc.
TABLE 3ChallengeDescriptionScaling CPUFor many processors the same clock that feeds thefrequency withCPU also feeds on-chip peripherals, so scaling theworkload oftenclock based upon CPU workload can have side-affects peripheralsaffects on peripheral operation. The peripherals mayneed to be reprogrammed before and/or after thescaling operation, and this may be difficult if a pre-existing (non power-aware) device driver is beingused to manage the peripheral. Additionally, if thescaling operation affects the timer generating the OSsystem tick, this timer will need to be adapted tofollow the scaling operation, which will affect theabsolute accuracy of the time base.V/F scalingThe latency for voltage and frequency scaling op-latencies can beerations will vary widely across platforms. Anlarge, and platform-application that runs fine on one platform may notdependentbe portable to another platform, and may not run ona revision to the same platform if the latencieschange much. For example, the time for a down-voltage scaling operation is typically load-depen-dent, and if the load changes significantly on therevised platform the application may not runcorrectly.Might not haveSome processor vendors specify a non-operationstable operationsequence during voltage or clock frequency changesduring V/F scalingto avoid instabilities during the transition. In thesesituations, the scaling code will need to wait for thetransition to occur before returning, increasing thescaling latency.V/F scaling directlyChanging CPU frequency (and voltage when pos-affects ability tosible) will alter the execution time of a given task,meet deadlinespotentially causing the task to miss a real-timedeadline. Even if the new frequency is compatiblewith the deadline, there may still be a problem if thelatency to switch between V/F setpoints is too big.Scaling the CPUIf the clock that feeds the CPU also feeds the OSclock can affecttimer, the OS timer will be scaled along with theability to measureCPU, which compromises measurement of CPUCPU utilizationutilization.Watchdogs stillWatchdog timers are used to detect abnormal pro-need to be keptgram behavior and either shutdown or reboot ahappysystem. Typically the watchdog needs to be servicedwithin a pre-defined time interval to keep it fromtriggering. Power management techniques that slowdown or suspend processing can therefore inad-vertently trigger application failure.Idle and sleepDepending upon the processor and the debug tools,modes typicallyinvoking idle and sleep modes can disrupt thecollide withtransport of real-time instrumentation and debuggingemulation, debug,information from the target. In the worst case it mayand instrumentationperturb and even crash the debug environment.Similar concerns arise with V/F scaling, which maycause difficulty for the emulation and debug cir-cuitry. It may be the case that power management isenabled when the system is deployed, but onlyminimally used during development. Context save/restoreIn a non-power managed environment the OS orcan becomeapplication framework will typically save and re-non-trivialstore register values during a context switch. Asregister banks, memories, and other modules arepowered OFF and back ON, the context to be savedand restored can grow dramatically. Also, if amodule is powered down it may be difficult (andsometimes not possible) to fully restore the internalstate of the module.Most advancedMany of the research papers that demonstrate sig-power managementnificant power savings use highly specialized appli-techniques are stillcation examples, and do not map well to generalin the research stageapplication cases. Or, they make assumptions re-garding the ability to fully characterize an applica-tion such that it can be guaranteed to be schedulable.These techniques often do not map to ‘real world’,multi-function programmable systems, and moreresearch is needed for broader applicability.Different types ofDifferent hardware platforms have varying levels ofapplications call forsupport for the above listed techniques. Also,different techniquesdifferent applications running on the same platformmay have different processing requirements. Forsome applications, only the low-latency techniques(e.g., clock idling) are applicable, but for others thehigher-latency techniques can be used to providesignificant power savings when the applicationswitches between modes with significantly differentprocessing requirements. For example, one modecan be run at low V/F, and another mode, withhigher processing requirements, can be run at ahigher V/F. If the V/F latency is compatible with themode switch time, the application can use thetechnique.