Field of the Invention
The present invention relates to a method of deriving an error correction means suitable for the entire system in consideration of influence from an input error rate of a certain gate to an output error rate of the gate and a course of propagating an error rate between gates when the error is injected to the input of a system block or the inside of the block in order to improve reliability of a System-on-Chip (SoC) through fault tolerance verification.
Description of Related Art
As a semiconductor manufacturing process technology is improved, and a highly integrated circuit can be implemented, a system-on-chip (SoC) has been proposed, in which various semiconductor components such as a processor, a memory, and a peripheral are implemented on a single chip.
The “SoC” is a semiconductor integrated circuit that integrates all components of the entire system into a single chip, by which main semiconductor elements such as a computation element, a storage element, and a data conversion element can be implemented on a single chip. That is, a single chip can be operated as a single system by integrating main components such as a central processing unit (CPU), a digital signal processing (DSP) chip, and a microcontroller (MCU) on a single semiconductor package. By integrating various functionalities into a single semiconductor chip, it is possible to dramatically reduce a space on a circuit board and a system size. As a result, it is possible to miniaturize various electronic systems. Furthermore, compared to a case where several semiconductor chips are separately manufactured, a manufacturing cost can be remarkably reduced, so that it is possible to lower the price of the entire system.
Therefore, the SoC technology that integrates functionalities of all components on a single chip arises as a core component technology in the high technology digital era characterized in high performance, low cost, and miniaturization. Continuous performance improvement in such a SoC technology makes the number of semiconductor components integrated on a single chip gradually increasing. Accordingly, a test for detecting a defect in the SoC is emerging as an important issue.
As a process technology is improved, a reliability problem that may cause a failure in the functionality of the digital circuit becomes important. Although several error correction techniques have been implemented in order to address such a problem, they are expensive. Therefore, system semiconductor designers began to consider fault tolerance as one of important design factors in addition to performance and low power consumption.
The failure can occur based on a lot of factors. In many cases, the failure in the system functionality is generated from a manufacturing process or environmental factors after the manufacturing. If a semiconductor is aged as a service time increases, a problem may occur in a switching timing, or alpha particles may generate a data error due to cosmic rays coming from the universe and reacting with the air. In addition, an erroneous operation of the entire system may also occur due to crosstalk, which is electrical interference between various wiring lines caused by narrowing a gap therebetween, various radioactive rays emitted from a radioactive decay, thermal noise that influences on a threshold voltage of a semiconductor, and the like.
The SoC may suffer from a failure due to an environmental change such as cosmic ray particles, power noise, and crosstalk. A temporary error that hinders a normal operation without disrupting a digital circuit is called a soft error.
In 1962, there was a prediction that a circuit might fail due to cosmic ray particles. In 1975, a circuit failure caused by cosmic rays was reported the first time. In 1978, a soft error was observed in an SRAM lying on a soil surface, and a research for overcoming such an error began along with error modeling.
In the SoC, power noise (voltage drop) and radiations remarkably contribute to an increase of the error rate. In addition, thermal noise also generates an error that hinders a normal operation of a circuit. Various other changes of external environments hinder a normal operation of the SoC and generate an error.
Errors caused by various factors in this way may hinder normal operation of the SoC and may generate a circuit failure in reality. In some case, errors may be blocked internally, and the circuit may operate normally.
An error of a circuit generated in the SoC may be internally blocked using some techniques such as logic masking, temporal masking, or electrical masking, and the circuit may operate normally.
As described above, various types of faults occur in a circuit due to various types of factors. Hardware faults may be classified into a permanent fault, an intermittent fault, and a transient fault depending on how frequently it occurs in a circuit. Such faults adversely affect a circuit, so that a data error may occur, in which data unintentionally changes in reality. The soft error is an error that changes only data without harming hardware due to surrounding environmental factors. In order to recognize and cope with such an error of a circuit, it is necessary to perform modeling regarding when, where, and how frequently errors occur, and how the system responds to such an error.
It is also desirable to perform modeling regarding whether or not a failure occurs in the SoC due to a single or multiple errors as well as frequency, location, and time of the errors in the SoC.
Circuit-level error modeling is easy to perform using a commercial simulation solution. In addition, it is easy to change or monitor node values inside a circuit under a simulation environment. However, the circuit-level modeling takes a lot of time to validate efficiency of a fault-tolerant design method for a relatively complicated SoC based on a circuit-level error model.
Gate-level error modeling is used when the error modeling of each gate is simulated by performing error modeling for each type of the gate used in the SoC. The gate-level error model is also easily used to derive an analytical method. In addition, a fault-tolerant platform can be relatively easily developed by applying a net list of the SoC. However, it takes long time to validate efficiency of the fault-tolerant design method in the SoC level.
Chip-level error modeling enables a designer to perform a high-level simulation based on an analytic method by performing modeling in a high level. In addition, a resultant model is relatively simple, so that it takes less time in development of an error verification platform and validation of a fault-tolerant design method. However, it is difficult to analyze operations for the errors in internal blocks of the SoC by determining an error in a chip level.
Modeling techniques can be classified into system-level modeling, register-transfer-level (RTL-level) modeling, gate-level modeling, switch-level modeling, physical-level modeling, and the like depending a circuit approach level. There are some differences in accuracy and complexity between each technique. The reliability verification takes more time as a modeling level becomes delicate from the system level to the semiconductor level. Therefore, a designer should select an analysis level which is suitable for a design specification and conditions to verify fault resilience of a circuit. The gate-level modeling is faster than the switch-level modeling, which necessitates consideration on operations of a semiconductor, and is more accurate than the system-level modeling and the RTL-level modeling.