Computers utilize a large number of semiconductor components. In a space environment, the operational reliability of computers utilizing small-geometry semiconductor parts has become severely compromised. In this environment, the chips, especially those utilizing less than three micron geometry, suffer random errors due to the presence of cosmic rays or high energy particles or ions. The geometries of the newer semiconductor parts are so small that passage of a high energy particle/ion or cosmic ray through the p-n junction of a semiconductor device sometimes causes an upset in the operation of the device. In memory systems, the problem occurs when a cosmic ray or high energy particle/ion passes through a sensitive junction in a storage element (e.g. a stage of a shift register) internal to a circuit and results in an arbitrary change-in-state of that storage element. The result in a memory is that a state of "zero" becomes a state of "one" or vice versa. This phenomenon of one change of state is called a Single Event Upset (SEU). SEU, sometimes, is also referred to as a bit flip-flop or a soft error. This soft error is temporary in nature and disappears when the memory is reused for storing a new bit.
Therefore, unless there is a means to determine the logic state in which the storage element is supposed to be, the system will have no means of determining that an upset has occurred, and will operate as if the erroneous logic state is correct. Further, the failure mode resulting from a SEU in a storage element is random and therefore unpredictable.
Single Error Correction Double Error Detection (SECDED) coding techniques have long been used in semiconductor memory systems to increase their reliability. In an Error Detection and Correction (EDAC) of memory system, a SECDED code word is stored with each word of memory. Each time a memory word is accessed, both the data word and the code word are output and the code word is used to check the validity of the data word. Single errors detected are corrected prior to the use of the data and the corrected word is stored back in memory. This type of correction technique will handle SEU's occurring within a memory, providing that the entire memory is checked at sufficient intervals so that any given memory word will not accumulate more than one SEU between being checked.
The storage elements or the internal memory within a bit-parallel architecture microprocessor or Central Processing Unit (CPU) present a more complex problem because typically they are in a less ordered structure, and with many operations going on in parallel it is very difficult to maintain their relationship and determine in real time how far an error has propagated. Most present techniques used to solve the SEU problem in a random logic circuit such as a CPU, involve making the storage elements immune to upset by adding resistance in the feedback paths of the latches. The resistance in conjunction with the device capacitance functions as a filter which reduces or eliminates the effect of impulse (cosmic ray or high energy particle/ion) that may cause an upset. The principal drawbacks of these techniques are that the additional components require additional area on the chip and result in a reduction of operating speed. It is anticipated that present methods of SEU prevention will become less effective as device geometries continue to become smaller, thereby requiring higher resistance values or addition of more capacitance to the nodes.