1. Field of the Invention
The present invention is directed in general to memory devices and methods for operating same. In one aspect, the present invention relates to memory systems having error correction and methods of operating those systems.
2. Description of the Related Art
Error correction code (ECC) can be used to correct bit errors that can be randomly caused by soft error events, such as arise from alpha particle or other high energy particle memory impact. In this area, the soft error rate (SER) is the rate at which a device or system encounters or is predicted to encounter soft errors. ECC memory designs provide a type of computer data storage that can detect and correct single bit failures. Now as technology continues to scale, bit cell degeneration increases over time due to latent defects, resulting in more severe SER and limited memory reliability. In addition, conventional ECC memory designs cannot correct a hard failure at a data element for a soft error hit using single bit correction techniques. To correct such combinations of hard and soft errors, more complex and costly multi-bit error correction is required, but at the expense of increased die size and operational latency. When multi-bit error correction is not available, a word that has two bits with errors is nearly always uncorrectable. Uncorrectable errors create a significant problem in system operation, and they should therefore be very infrequent and preferably never occur. The situation where two bit errors have a significantly increased likelihood is where a single bit error in a particular word is recurring. If a word has a bit that has failed on a continuous basis, then when a random error occurs in that word, there are two bits in the word that need correcting which is not likely to be possible. When such an event occurs, there is a significant cost to system operation.
One ECC technique addresses this issue by writing back the data to the memory location whenever an error has been detected, and then reading the memory location again to see if the error is repeated. If it is repeated, then the error is corrected by redundancy, i.e., the data from that memory location is stored elsewhere in spare memory. This can be effective to some extent, but some bit errors arise with weak bits that are leaky, pattern sensitive, or power supply sensitive bits. Such bit errors are difficult to detect because, when corrected data is written back, the bit cells can hold data for a period of time and be tested good, but still fail over time. This can be particularly true in a high temperature environment where leakage causes a failure. The high temperature raises the leakage so a failure soon occurs, but not soon enough to be found by the test after re-writing the data. Also, this re-writing of the data and subsequent testing is likely to be disruptive to system operation. As a result, the existing solutions for detecting and correcting a combination of hard failures, leaky bits, and SER correction without resorting to multi-bit error correction is extremely difficult at a practical level.
It will be appreciated that for simplicity and clarity of illustration, elements illustrated in the drawings have not necessarily been drawn to scale. For example, the dimensions of some of the elements are exaggerated relative to other elements for purposes of promoting and improving clarity and understanding. Further, where considered appropriate, reference numerals have been repeated among the drawings to represent corresponding or analogous elements.