In any electronic system that operates critical functionality, it is useful to have the ability to detect errors to a certain probable degree. These errors can occur from various sources including random errors caused by radiation effects on electronic parts, specifically large banks of random access memory (RAM). RAM devices are susceptible to radiation induced bit changes, which can lead to erroneous system behavior if gone undetected. The ability to correct bit errors is also desirable because it allows the system to continue operation in spite of radiation induced bit errors.
In conventional systems, designs have incorporated hamming code based error detection and correction (EDC) schemes to accomplish error detection and correction. These schemes are adequate for single bit detection and correction. However, as RAM device geometries have gotten smaller, individual radiation events have begun to affect multiple bits in the same device with higher probabilities. The error rate for such memory devices is around 1.5E-10 errors per bit-hour. For a given memory size of 128 Mbits, the memory system error rate is 1.5E-10*1.28E8=0.00192 failures per hour or an uncorrectable event every 500 hours. EDC cannot detect even bit failures beyond 2 bits. Hence, some errors are not detectable by EDC, causing the undetected error rate to be too high for a single threaded memory system to host critical functions.
Accordingly, it would be desirable to detect and correct all errors so that the restart rate due to soft errors would go to zero.