Soft errors are transient faults caused by various types of radiation. Radiation-induced transient faults can abruptly flip the stored state of a system and cause a system crash or even worse—a silent data corruption (SDC)—if they are undetected.
Atmospheric radiation, such as cosmic rays, have long been regarded as the major source of soft errors, especially in memories, and chips used in space applications typically use parity or error-correcting code (ECC) for soft-error protection. As process geometries continue to scale down, the amount of energy required to cause an error is lowered. Reduced feature sizes, higher logic densities, shrinking node capacitances, lower supply voltage, and shorter pipeline depth have significantly increased the susceptibility of integrated circuits (ICs) to single event upsets (SEUs) in memories and sequential elements (including scan cells), and single event transients (SETs) in combinational logic. Terrestrial radiation, such as alpha particles from the packaging materials of the chip, is also starting to cause soft errors with increasing frequency. This has also created system reliability concerns, especially for chips used in the automotive, healthcare, and networking industries.
Recent studies reveal that for an IC designed with a feature size smaller than 65 nm, all memories, combinational logic, and sequential elements are more susceptible to soft errors. Since parity or ECC circuits are often used to protect memories from soft errors, the remaining issues are how to identify and harden or protect those scan cells and combinational logic that are most susceptible to soft errors.
Prior art approaches have centered on designing new robust scan cells using a basic scan flip-flop [1] or a scanout flip-flop [2] as a basic scan cell. The basic scan flip-flop consists of a system flip-flop and a scan portion for test purpose. The scanout flip-flop consists of a system flip-flop and a scanout portion for debug purpose. Alternatively, the system flip-flop can be a latch or a pulse latch [3]. The data input signal and the system clock controlling the system flip-flop are reconfigured to drive the scan portion and the scanout portion. For instance, researchers at Intel have designed a few robust scan cells using the built-in soft error resilience (BISER) technique for protecting these basic scan cells from SEUs (see U.S. Pat. Nos. 7,278,074; 7,188,284; 7,278,076; and 7,373,572 issued to Mitra, Zhang, Mak, et al.) and further using time redundancy for protecting combinational logic from SETs (U.S. Pat. No. 7,523,371 issued to Mitra et al.). A typical BISER cell may consist of a basic scan flip-flop and an output joining circuit for both test and soft-error resilience. Alternatively, the typical BISER cell may consist of a scanout flip-flop and an output joining circuit for both debug and soft-error resilience. The output joining circuit may be a transmission gate, a C-element, an XOR gate, or an error detection circuit (see FIGS. 1-5). Other research has time redundancy, triple modular redundancy (TMR) including a majority voter (see FIG. 6), or a combination of both for soft-error correction [4,5,6,7]. For instance, a U.S. Pat. No. 7,482,831 issued to Chakraborty et al. (January/2009) uses a special type of TMR and a majority voter for correcting soft errors caused by SEUs in the scan cells. The TMR comprises a system flip-flop, a scan portion, and a hold flip-flop; the hold flip-flop takes an input from an output of the system flip-flop and is also used for enhanced scan testing which eases test generation and test application for delay faults [8]. These prior art approaches, however, do not address the need for robust scan cells with the capabilities for performing (1) a functional testing using slow-speed snapshot which allows the system to shift out the contents of the robust scan cells, upon capturing the data input signal, at a reduced shift clock frequency when the system clock is still running, (2) a functional testing using at-speed or slow-speed signature analysis which allows the system to generate the XOR value (called a signature) of the data input signal and the previous scan-in data signal every system clock cycle or every two or more system clock cycles, (3) a defect tolerance which allows the system to continue operation when the system flip-flop in the robust scan cell has permanent defects, and (4) a manufacturing test which allows designers to capture the output from the system flip-flop for analysis. Also, there are no effective robust scan designs available that can tolerate both SEUs in the latches and SETs in combinational logic, while at the same time being able to perform test, debug and defect tolerance. There are also no effective defect-tolerance schemes to protect redundant modules (e.g., when using duplication or TMR) against permanent defects, while protecting synchronous or asynchronous designs against soft errors [9].
Therefore, there is a need to develop more robust scan cells for test, debug, soft-error protection (either soft-error resilience or soft-error correction), and defect tolerance. There is also a need to develop a robust scan synthesis flow that allows designers to synthesize soft-error protection logic, along with needed test, debug, and defect tolerance features, and generate needed testbenches to verify the correctness of the robust scan design. There is also a need to provide a robust defect tolerance scheme to protect redundant modules against permanent defects, while protecting synchronous and asynchronous designs against soft errors.