This invention relates to the control of clock signals used by a processor, in particular to selectively implement cycle stealing when it is required.
Design and fabrication technologies are successful in scaling down transistor dimensions to integrate more and more transistors in a single Integrated Circuit (IC) such as a System-on-Chip (SoC). Technology scaling also introduces major challenges such as high defect rate and device parameter variations. These variations change the propagation delay in CMOS circuits, which may lead to functional failures of the IC.
A traditional SoC design methodology attempts to meet the timing requirements by considering “worst case” (WC) operating conditions resulting in a reduced maximum operating frequency, and increased area and power overheads.
The alternative of a “typical case” design approach offers a better trade-off between area, performance and energy. However, the drawback of typical case design is timing errors which can occur due to reduced timing margins in critical paths when the dynamic operating conditions (e.g. temperature/voltage) are changed or aging which make circuits slower than the typical operating frequency.
There is therefore a need for a variation-resilient architectural solution for enabling better-than-worst case designed ICs in order to improve design specifications (speed, area, power) without sacrificing functional failures.
Many techniques have been implemented in the Central Processing Unit (CPU) architectures to overcome the timing problem explained above, so that processors operate correctly and become variation resilient.
One of the earlier techniques is referred to as Razor, which is based on error detection and recovery in the CPU for example by pausing all pipeline stages (or time borrowing) while waiting for the slow stage either to finish its computation or to allow the instruction to be re-executed. This approach is disclosed in Dan Ernst Razor: Circuit-Level Correction Of Timing Errors For Low-Power Operation. Proceedings of the 36th International Symposium on Microarchitecture MICRO-36 2003.
The pausing action ensures that later instructions do not continue to their next pipeline stage until the faulty instruction is recovered.
FIG. 1 shows the known architecture which uses Razor flip-flops to detect errors and recover by clock gating.
The drawback of this technique is the feedback signal, which needs to propagate to all pipeline stages in a very short amount of time (50% of one clock cycle when Razor circuits are used).
This can be difficult to achieve across large CMOS dies where pipeline stages are several millimeters apart. Furthermore, this is completely impractical to implement in complicated microprocessors because it may take several clock cycles just to propagate the clock signal through a clock distribution network, which cannot be halted in only one cycle.
Applying Razor like techniques also implies a 20 to 30% increase in the area of the CPU. This is due to the fact that the Razor flip-flop typically has more than twice the size of a regular master-slave flip-flop.
Another technique is error prediction. Toshinori Sato, in “A Simple Flip-Flop Circuit for Typical-Case Designs for DFM” ISQED 2007, proposed an architectural modification to the Razor approach to simplify the design. The idea is to have two flip-flops with the same clock (removing the short path problem that appears in Razor because of the delayed clock), but the shadow flip-flop has a delay buffer in the data path. This circuit predicts that the data path fails if the voltage keeps downscaling or if the frequency is further increased.
This technique cannot detect errors beyond the checking window (delay buffer). Also, if the technique is used for process compensation, meta-stability problems will appear in the main flip-flop.
Traditionally, the error detection and decision should be implemented in one or half a clock cycle, which is difficult to achieve in a large processor, and this problem is not resolved by the approaches outlined above.