This invention relates, in general, to soft error rate (SER) estimation, and, in particular, to enabling accurate estimation and validation for integrated circuits (IC or chip) and Application Specific Integrated Circuits (ASIC) or other circuits used in digital system design.
Various subcircuits in integrated circuits and ASICs, such as register files, latches, and memory buffers, store data that is subject to corruption by soft errors (SE). SE occurs, for example, when ionizing radiation causes a latch or node in a memory array to invert its logical value. Most hardware has embedded logic to detect, correct, and log such errors and notify the software of such an event through exceptions. Instruction set simulators (ISS) are often used to verify the proper functioning of the chip in connection with virtual IC modeling the physical implementations of the chip. However, due to the inherently random nature of SE events and because these events are not always coupled with a specific instruction, the ISS or reference architecture cannot model these SE events. This presents a challenge in verifying hardware functionality pertaining to detection, and logging such errors, referred to collectively as SE handling.
Typically, the hardware logic associated with SE handling is verified with short directed self-checking tests. Such directed tests involve testing a very specific error type in a diagnostic program, e.g., just one instruction cache error, and comparing the expected results with the actual error log generated by the SE handling logic. This approach is inadequate for multi-threading processors because of the presence of multiple concurrent threads, which could be executing completely independent programs. Here, the proper error handling by the error encountering thread could be hampered by events on other threads. Furthermore, an error on one thread could leak to another thread causing spurious logging or functional incorrectness. For example, if a thread sees an error which is then improperly reported to a different thread, the second thread will behave as if the error occurred during execution of its own program, potentially resulting in data corruption.
An article entitled “A Systematic Approach to SER Estimation and Solutions” by Hang T. Nguyen and Yoad Yagil, IEEE 41st Annual International Reliability Physics Symposium 2003, describing soft errors, processor architecture, and failure in time (FIT) rate calculation is hereby incorporated in its entirety herein by reference.
Typically, SE FIT prediction is estimated early in the design phase using spreadsheets or similar tools, based on fairly coarse granularity estimates at the macro or chip level, rather than at the node level. This enabled early estimation of SE FIT, but had a large margin of error, was labor intensive, and was not regressible in the sense that if the design changed, the work could not easily be modified to account for the changes. A more accurate estimate was available late in the design phase, using a fault grading (FG) tool, but since it was late in the design phase any defects related to soft error upset (SEU) could not be corrected, or could only be corrected at relatively high cost. For convenience purposes, the term “soft error”, “soft error upset”, and their respective acronyms are used interchangeably. Cost is measured in money, time, or engineering effort, and it is generally recognized that defects found late in a design cycle require more to correct than those found early. As circuit densities increase, the probability that a SEU will affect the correct operation of a given circuit generally increases.