1. Field of the Invention
The present invention generally relates to techniques and systems for detecting and correcting errors in a circuit. More specifically, the present invention relates to techniques and systems for augmenting a circuit design to detect and correct timing errors.
2. Related Art
Advances in semiconductor fabrication technology have given rise to dramatic increases in the number of transistors per semiconductor device by fabricating smaller transistors and compacting these transistors closer together. However, these new design practices can make a fabricated integrated circuit (IC) more vulnerable to physical faults or parasitic effects that can influence the performance of the IC. In response to these potential dangers, ICs are often designed with redundancy, error detection, and error correction whenever possible to make these circuits more robust. However, the performance and layout overhead associated with such error detection and correction circuits can make them prohibitive in a new circuit design.
Timing delay errors for a manufactured IC, in particular, are difficult to estimate and prevent during the initial design cycle for an IC. These errors occur when the computation for a given input vector cannot be completed before registers are sampled to capture a produced result. In some occasions, a timing delay error can occur in a manufactured IC as a result of a random manufacturing error, at which point the faulty IC can be discarded. However, when a timing delay error is encountered across a batch of manufactured ICs, the clock frequency for the ICs can be lowered to allow the ICs to operate correctly, but at the cost of sacrificing the performance level that the ICs were designed to operate under.
On the other hand, a manufactured IC may be capable of functioning at a higher frequency than was predicted for its circuit design. When a given circuit is designed and implemented, the circuit is generally designed to operate within a conservative set of values for process and environmental variables (i.e., a design corner). Moreover, the timing predictions for the circuit generated by analysis algorithms and models are usually padded to account for their inability to model some physical, electrical, and/or logic effects. These predictions produce a design margin, which is the difference in timing between the manufactured circuit and the analysis result, and it is a measure of how conservative the performance predictions are for a design process. When the assumed worst case scenarios fail to materialize in silicon due to a large design margin, the outcome of the large design margin is an overdesigned circuit, which is undesirable. It is common practice for a fabricated IC to be tested under a wide range of power and clock frequency parameters to determine the actual functioning parameters of the IC. A breaking point of the IC is usually found once the IC experiences timing errors.
A number of solutions have been developed for detecting and correcting timing errors, and a number of these solutions are based on double data sampling registers (DDSRs). A DDSR is a modified flip flop (FF) that is capable of detecting delay errors by using an extra “shadow” latch that samples the data later than a standard register, and then comparing the two sampled data for differences. If the “shadow” latch of the DDSR samples a value that is different than the regular latch of the DDSR, then the DDSR determines that it has detected an error. Once an error has been detected by a DDSR, an error signal is propagated to logic in the design that can correct the error.
However, existing systems that utilize DDSRs to implement an error detection circuit often require the presence of a specialized pipeline organization of the circuit, and perform error correction by flushing the pipeline and replaying the instructions/data. Two common DDSR solutions for performing error detection and error correction include:                Correcting an error in situ by either stalling the clock to allow time for replacing the incorrect value with the correct value from the “shadow” latch, or by stalling the data in a modified pipeline design.        Flushing a circuit pipeline after detecting an error in the pipeline, and allowing the circuit pipeline to recompute the results.        
These implementations can impose a large physical overhead on a given circuit design, and can introduce a significant performance loss when flushing the pipeline to perform error correction. The first solution can impose strict limitations on what class of circuits can be augmented with the error detection and correction capabilities, as it requires simple pipelines. Furthermore, it also imposes a large circuit overhead due to its error correction mechanism and the modified circuit pipelines. The first solution is typically used for custom circuit designs, where error detection and error correction is applied to specific portions of the design. The second solution is not widely used to augment general ASIC designs with error detection and correction capabilities because it can impose a large performance overhead whenever an error is detected.