1. Field
This disclosure relates generally to memory systems, and more specifically, to memory systems having error correction and methods of operating those systems.
2. Related Art
Error correction code (ECC) is commonly used to correct single bit errors that commonly occur as soft errors that are caused randomly; often by alpha particles as well as other high energy particles. The ECC is thus generally designed to correct single bit failures. The layout of the memory is often interleaved so as to further reduce the likelihood of double bit failures due to a soft error event. Therefore, errors caused in this way very rarely have more than one error per word. A much more complex and costly ECC is required to correct two bit errors. A word that has two bits with errors thus is nearly always uncorrectable. Uncorrectable errors create a significant problem in system operation so should be very infrequent and preferably never occur.
The situation where two bit errors have a significantly increased likelihood is where a single bit error in a particular word is recurring. If a word has a bit that has failed on a continuous basis, then when a random error occurs in that word, there are two bits in the word that need correcting which is not likely to be possible. When such an event occurs, there is a significant cost to system operation.
One ECC technique addresses this issue by writing back the data to the memory location whenever an error has been detected then reading the memory location again to see if the error is repeated. If it is repeated, then the error is corrected by redundancy, i.e, the data from that memory location is stored elsewhere in spare memory. This can be effective to some extent, but some bit errors that occur do not occur immediately, but have a delayed failure. Thus, the bit may fail some time after having been written so it passes the test, but it soon fails thereafter. This can be particularly true in a high temperature environment where leakage causes a failure. The high temperature raises the leakage so a failure soon occurs but not soon enough to be found by the test after re-writing the data. Also, this re-writing of the data and subsequent testing is likely to be disruptive to system operation.
Accordingly there is a need for a memory system that overcomes or improves upon the problems described above.