1. Field of the Invention
The present invention relates to a data processing apparatus and method using monitoring circuitry to control operating parameters of the data processing apparatus.
2. Description of the Prior Art
Within a data processing system, for example an integrated circuit, it is known to employ adaptive power management techniques in order to reduce the power dissipation within the system. In accordance with adaptive power management techniques, one or more operating parameters (for example voltage or frequency) of the data processing system are modified during operation to seek to reduce power dissipation. Adaptive power management is becoming ever more important as process geometries decrease, due to increase in leakage current consumed by components as they shrink in size.
Considering voltage supply as one example of an operating parameter, when a data processing system is designed, a nominal operating voltage can be associated with the design. During post-manufacturing tuning, that operating voltage may be modified slightly having regards to variations introduced at the time of manufacture. However, such voltage levels are always by their nature set conservatively, to ensure that the circuit will operate correctly under all expected operating conditions. However, running a system at a voltage higher than necessary has a significant impact on power consumption. Adaptive power management techniques hence aim to reduce the power consumption by allowing operating parameters such as system clock frequency and supply voltage to be dynamically adjusted to meet the application throughput requirements.
With the aim of allowing margins in the setting of operating parameters to be reduced, it is known to provide functional circuitry within a data processing apparatus with error correction circuitry that is able to detect errors in operation of the functional circuitry and repair those errors in operation. Such an error correction circuit can be embodied in a variety of ways, but in one embodiment may take the form of a single event upset (SEU) tolerant flip-flop such as discussed in commonly owned U.S. Pat. No. 7,278,080, the entire contents of which are hereby incorporated by reference, this patent describing a design technique sometimes referred to as “Razor”. In accordance with the basic Razor technique, a delay-error tolerant flip-flop is used on critical paths to allow the supply voltage to be scaled to the point of first failure (PoFF) of a die for a given frequency. Thus, all margins due to process-voltage-temperature (PVT) variations are eliminated, resulting in significant energy savings. In addition, the supply voltage can be scaled even lower than the first failure point into the sub-critical region, deliberately tolerating a targeted error rate, thereby providing additional energy savings. A further paper that describes the Razor technique is “Razor II: In-Situ Error Detection and Correction for PVT and SER Tolerance”, IEEE Journal of Solid-State Circuits (JSSC), Volume 44, No. 1, January 2009.
Efficient and robust control of functional circuits, such as processors, that include in-situ error detection and correction mechanisms such as Razor is a non-trivial task. The conventional approach is to set the operating point (for example voltage and frequency of operation) in proportional response to the observed Razor error rate. However, there are a number of scenarios where this control scheme might result in significantly sub-optimal performance. The main reason for this is that there may be a significant delay before a change in environmental conditions is reflected in the Razor error rate, due to path activation. In other words, for a Razor flip-flop to detect a timing violation on its associated critical path, it is necessary to first sensitise that critical path, which in turn depends on the nature of the program phase being executed. As a particular example, if a processor is currently running only low intensity tasks, it may be that changes in environmental conditions that would be problematic were the processor busy do not initially cause any timing violations due to critical paths not being sensitised. As a result, interactions of program phase limiting critical path sensitisation, and fast changes in environmental conditions (for example local heating, IR drop, etc), can lead to underestimation of the actual operating point.
Underestimation of the operating point can in due course give rise to performance issues, since if the program phase subsequently changes significantly, and accordingly critical path activation increases, there will then be a sharp increase in the Razor error rate, which results in no forward progress in the pipeline (due to the stall and flush mechanisms being used to replay and correct the Razor errors), this situation persisting until the operating point is increased to an appropriate level, which can take hundreds of processor cycles for phase locked loop (PLL) lock time or off-chip voltage regulation to settle.
A known approach to measure fast changes in environmental conditions is a delay monitoring circuit, which involves sending alternate rising and falling clock edges along a calibrated delay line (made up for example of a chain of buffers or inverters) and then checking that the captured logic value is as expected.
The article “A 45 nm Resilient and Adaptive Microprocessor Core for Dynamic Variation Tolerance” by J Tschanz et al, 2010 IEEE International Solid-State Circuits Conference, pages 282-284, describes a technique for adaptive power management which incorporates a delay monitoring circuit (referred to in the article as a tuneable replica circuit (TRC)) in a system using Razor-style error correction circuits (referred to in the article as error-detection sequentials (EDS)). The TRCs described in the article consist of configurable inverter paths that are tuned at test time via scan to track critical path delays per pipeline stage of the processor. As a result, such TRCs can detect timing errors caused by environmental conditions even if the associated critical path in the processor is not sensitised at the time.
However, the approach described in the article requires calibration of the TRC (delay monitoring circuit) at test time over a variety of PVT conditions. In practice significant tester calibration time is too expensive for lower-margin ASIC products. Further, by calibrating the delay monitoring circuit at test time, this does not allow the delay monitoring circuit to take account of changes that occur over time through use of the processor, for example longer term effects such as wear-out (electromigration, Negative Bias Temperature Instability (NBTI), etc), and accordingly some margin would need to be included in order to allow for such long term effects.
It would be desirable to provide an improved technique for performing adaptive power management within a data processing apparatus employing in-situ error correction circuits.