1. Field of the Invention
The present invention relates to systems and methods for predicting current swings that can cause the voltage in a microprocessor to fluctuate beyond safe levels and for avoiding such swings.
2. Brief Description of the Related Art
Power-constrained CMOS designs are making it increasingly difficult for microprocessor designers to cope with power supply noise. As current draw increases and operating voltage decreases, inductive noise threatens the robustness and limits the clock frequency of high-performance processors. Large current swings over small time scales cause large voltage swings in the power-delivery subsystem due to parasitic inductance. A significant drop in supply voltage can cause timing margin violations by slowing logic circuits. For reliable and correct operation of the processor, voltage emergencies, i.e., large voltage swings that violate noise margins, must be avoided.
The traditional way to deal with inductive noise is to over-design the processor to allow for worst-case fluctuations. Unfortunately, the gap between nominal and worst case operating conditions in modern microprocessor designs is growing. A recent paper on supply-noise analysis for a POWER6 processor shows the need for timing margins that accommodate voltage fluctuations of more than 18% of nominal voltage (200 mV dips at a nominal voltage of 1.1V). N. James, P. Restle, J. Friedrich, B. Huott, and B. McCredie, “Comparison of Split-Versus Connected-Core Supplies in the POWER6 Microprocessor,” ISSCC 2007 (2007). Such conservative operating voltage margins ensure robust operation of the system, but can severely degrade performance due to the lower operating frequencies.
The power ceiling in modern microprocessors presents a major challenge to continued performance scaling. Power reduction techniques such as clock gating, when aggressively applied to constrain power consumption, can lead to large current swings in the microprocessor. When coupled with the non-zero impedance characteristics of power delivery subsystem, these current swings can cause the voltage to fluctuate beyond safe operating margins. Such events, called “voltage emergencies,” have traditionally been dealt with by allocating sufficiently large timing margins. Unfortunately, on-chip voltage fluctuations and the margins they require are getting worse. Given the direct impact of voltage on circuit delay, intermittent voltage droops, past a lower operating margin, can slow down logic delay paths and lead to timing violations. Voltage spikes that exceed an upper margin can cause long-term reliability issues. Hence, modern designs impose conservative operating voltage margins to avoid these voltage emergencies and guarantee correct operation in the microprocessor. However, large margins translate to inefficient energy consumption and lower performance.
A number of throttling mechanisms have been proposed to dampen sudden current swings, including frequency throttling, pipeline freezing, pipeline firing, issue ramping, and changing the number of the available memory ports. However, such mechanisms require a tight feedback loop that detects an imminent violation and then activates a throttling mechanism to avoid the violation. The detectors are either current sensors or voltage sensors that trigger when a soft threshold is crossed, indicating a violation is likely to occur. Unfortunately, the delay inherent in such feedback loops limits effectiveness and necessitates margins sufficiently large to allow time for the loop to respond.
A typical sensor-based proposal uses a tight feedback loop like that shown in FIG. 1(a). The loop includes a sensor that tries to detect impending emergencies and a throttling actuator that tries to avoid them. The sensor relies on a soft current or voltage threshold as a “canary”. Crossing that threshold means that voltage is approaching its lower margin, so the actuator turns on throttling until the crisis is past. Proposed throttling schemes range from frequency throttling, to pipeline freezing/firing, to issue ramping, and altering the number of accessible memory ports. The behavior of the feedback loop is determined by two parameters, the setting of the soft threshold level and the delays around the feedback loop. Unfortunately, choosing those parameters to accommodate reduced operating margins is thwarted by correctness failures and/or performance penalties.
FIG. 1(b) illustrates the use of a soft threshold to throttle execution and prevent an emergency. The graph shows voltage waveforms with and without sensor-based throttling (Throttled Execution and Uncorrected Execution, respectively). The solid horizontal line marked Aggressive Soft Threshold indicates the threshold at which a voltage sensor starts to take action to prevent an emergency. Setting the soft threshold aggressively (i.e., close to the lower operating margin) requires a very fast reaction by the sensor and actuation system. Failure to respond quickly enough results in a voltage emergency. In FIG. 1(b), the voltage starts to recover under throttling, but not in time to avoid crossing the lower operating margin.
FIG. 2(a) shows the sensitivity of sensor-based mechanisms to feedback loop delays by plotting the number of emergencies that go unsuppressed in our benchmark suite as a function of sensor-loop delay times. The graph assumes the soft threshold to be 3% below the nominal voltage and the lower operating margin to be 4% below nominal. Feedback loop delays ranging between 0 and 5 cycles would require a nearly perfect sensor. Yet even a 2-cycle delay causes 50% of all soft threshold crossings to violate the simulated microprocessor's minimum operating margin specification. In other words, fail-safe execution is not possible at this margin using sensor-based schemes, as they cannot operate in a timely manner.
To accommodate slow sensor response times and ensure that throttling effectively prevents emergencies, sensor-based schemes can use conservative soft thresholds. Lifting the soft threshold away from the lower operating margin, as illustrated by the Conservative Soft Threshold in FIG. 1(c), gives the throttling system more time to prevent an emergency. But as the Uncorrected Execution waveform in FIG. 1(c) shows, even in the absence of throttling, a soft threshold crossing may not be followed by an emergency. Throttling execution in such cases decreases performance without any compensating benefit. The more conservative the soft threshold setting, the greater the performance penalty. FIG. 2(b) shows that this penalty can be quite large. Assuming an ideal sensor with no feedback loop delay (i.e., 0-cycle sensor delay), the percentage of benign soft threshold crossings is between 77% and 58% for soft thresholds ranging from 2% to 3%. So even if it were possible to design a feedback loop with no delay, the large performance penalties would deter architects from reducing operating margins.
A sensor-based scheme proposed by Powell and Vijaykumar reduces sensitivity to feedback loop delay by focusing on voltage emergencies that are the result of resonating patterns. See M. Powell and T. N. Vijaykumar, “Exploiting Resonant Behavior to Reduce Inductive Noise,” ISCA, 2004. While resonance-induced emergencies are dominant for some packages, recent work by Gupta et al. illustrates that non-resonant (pulse) events are also a major source of emergencies across a range of packages. Gupta, K. Rangan, M. D. Smith, G.-Y. Wei, and D. M. Brooks, “DeCoR: A Delayed Commit and Rollback Mechanism for Handling Inductive Noise in Processors,” HPCA '08 (2008). James et al. have observed isolated (non-resonant) pulses in a POWER6 chip implementation. N. James, P. Restle, J. Friedrich, B. Huott, and B. McCredie, “Comparison of Split-Versus Connected-Core Supplies in the POWER6 Microprocessor,” ISSCC 2007 (2007). And Kim et al. show that resonant emergencies are likely to become less important than isolated pulses in future chip multi-processors with on-chip voltage regulators, as package inductance effects are decoupled from the power grid via on-chip regulators. W. Kim, M. S. Gupta, G.-Y. Wei, and D. Brooks, “System level analysis of fast, per-core dvfs using on-chip switching regulators,” HPCA (2007). Therefore, to realize the benefits in improved energy efficiency or performance that reduced margins can enable, new solutions are needed that cope with both resonant and non-resonant voltage emergencies in future systems.
Another way to handle inductive noise is to design the processor for typical-case operating conditions and add a fail-safe mechanism that guarantees correctness despite noise margin violations. This strategy can improve performance, but only if the cost of using the fail-safe mechanism is not too high. However, the coarse-grained checkpointing intervals of traditional checkpoint-recovery schemes (between 100 and 1000 cycles) translate to unacceptable performance penalties. Gupta et al. have proposed a low-overhead implicit checkpointing scheme to handle voltage emergencies by buffering commits until it is confirmed that no voltage emergencies have occurred while the buffered sequence was in flight. M. S. Gupta, K. Rangan, M. D. Smith, G.-Y. Wei, and D. M. Brooks, “DeCoR: A Delayed Commit and Rollback Mechanism for Handling Inductive Noise in Processors,” HPCA '08 (2008). While shown to be effective, implicit checkpointing is specialized and requires modifications to traditional microarchitectural structures.