In modern integrated circuits (ICs) aggressive voltage and frequency scaling is widely utilized to exploit the design margin introduced by Process-Voltage-Temperature (PVT) variations. The PVT variations result into a randomized variability of the transistor parameters, e.g. gate width and length, channel mobility, threshold voltage Vth. Besides, supply voltage Vdd is also fluctuating due to noise or IR drop. As a consequence, the propagation delays of two theoretically identical transistors are different. This phenomenon becomes more significant with the advancement of CMOS technology. In addition, scaling the supply voltage and/or frequency beyond the critical voltage or frequency of a transistor results in numerous timing errors and, hence, unacceptable output quality.
To cope with the transistor variability, ICs are conventionally designed at the worst PVT corner to ensure the transistors always operate correctly in synchronized circuits. Nevertheless, ICs rarely operate at the worst corner. Therefore, this worst case approach introduces a design margin, leading to wasted performance capability and power consumption.
Recently, on-chip monitor techniques have been proposed to reduce this design margin. On-chip monitors, e.g. voltage and/or timing monitors, are embedded in the ICs to estimate the timing slacks during operation. The “slack” is the difference between the required time (in this case the clock period) and the arrival time. If the slack is positive or zero, the logic is fast enough for the computation. If the slack is negative, the logic is too slow. If the timing slack is too large, the system can reduce the voltage (to save power) or increase the clock frequency (to increase speed). By reducing the voltage or increasing the clock frequency, the computations are completed within a longer period of time which in turn reduces the timing slack. In the end, when the timing slack is close to zero, the voltage and clock frequency are kept constant to assure the computations are completed within the time budget. Operating an IC at a specific voltage and a clock frequency for which the timing slack is zero is called a critical operating point. Scaling the supply voltage and/or frequency beyond that critical point will result into a negative timing slack, which leads to timing errors. Besides, other parameters, e.g. temperature, CMOS body biased voltage, transistor aging, also affects the timing slack and, hence, the critical point.
In-situ schemes based on a timing-error detection scheme (EDS) and an error correction scheme (ECS) have been proposed. For the timing-error detection scheme (EDS) on-chip timing monitors, such as Razor-based monitors (Razor) and Double Sampling with Time Borrowing (DSTB), are widely used. They are proposed to replace the Flip-Flop (FF) in the circuit. By using this EDS it is detected whether a path violates the set-up timing constraint (timing slack).
A Razor-based monitor detects a timing error in a main flip-flop (FF) with a shadow latch as described in the paper “Razor: a low-power pipeline based on circuit-level timing speculation,” (Ernst et. al., 36th Annual IEEE/ACM Int'l Symp. on Micro-Architecture, pp. 7-18, 2003). However, the Razor monitor exhibits meta-stability problem in the data path. The meta-stability problem occurs in Flip-Flop (FF) circuits when the input signal changes along with the rising edge of the clock signal. In this case, one or few transistors of the FF circuit are pulled to ‘1’ and ‘0’ simultaneously. The circuit will then require an unlimited time to resolve to a final state (‘1’ or ‘0’), according to the environmental noise. In the Razor circuit, this happens to the FF which is located in the data path. Thus, the system runs into a meta-stable state, which is difficult to resolve.
The DSTB monitor proposes to swap the position of the FF and the latch to eliminate the disadvantages of the Razor monitor in the data-path. In a DSTB circuit the data signal arrives later than the required timing constraint (i.e. time margin), e.g. the rising edge of the clock signal. So, the data signal will still be captured by the latch. The latch circuit is sensitive even after the rising edge of the clock signal in contrary to a flip-flop. The DSTB circuit detects a timing violation by comparing the results from the latch and the FF. As the signal from the latch can be used in the next cycle as the “correct input from the previous cycle”, the DSTB can find the exact timing slack and hence utilize it to reduce the design margin. However, if the data signal arrives late, the time to perform the computation of the next cycle is insufficient and hence the computation cannot be performed. To compensate for the lack of time, error correction schemes (ECS) have been proposed.
Conventional ECSs correct a timing error, for example, by issuing extra cycles (counter flow) as proposed in the above-mentioned paper by Ernst et al or by re-issuing the instruction (instruction replay) as proposed by Bowman et al in “Energy-Efficient and Metastability-Immune Resilient Circuits for Dynamic Variation Tolerance” (IEEE Journal of Solid-State Circuits, vol. 44, no. 1, pp. 49-63, Jan. 2009). In both solutions extra cycles are issued to avoid computational errors, resulting in a multiple-cycle penalty. When scaling beyond the critical point, errors are generated, which are detected by EDS and corrected by ECS. A sub-critical point is defined as the point indicating the maximum negative timing slack the system can tolerate with the proposed EDS and ECS. When scaling beyond the sub-critical point, the system will fail totally. This situation should be avoided by the user.
EDS schemes as mentioned above, i.e. counter flow and instruction replay, can correct timing error at the cost of extra computation cycles. Those extra cycles result in a throughput penalty. However, for real-time streaming applications, e.g. in a communication system, a constant throughput is required for the signal processing circuits. It is critical to maintain the cycles per instruction (CPI) without throughput penalty at the situations between a critical situation and a sub-critical situation.
Hence, there is a need for an approach to deal with timing errors wherein the above-mentioned limitations are avoided or overcome.