1. Field of the Invention
Embodiments of the present invention relate, in general, to integrated circuit dynamic memories and more particularly to methods and architectures for detecting and correcting errors in dynamic memories.
2. Relevant Background
Semiconductor memories are typically laid out in rows and columns. Thus, a memory address can be thought of as a means by which to select a cell located at a particular row and column at which bits of information are maintained.
Memories are typically sub-divided into banks. Banks include an array of memory cells with multiple rows and columns. Banks also typically include driver, amplifier and pre-charge circuitry required for reading and writing to the memory. A memory can therefore use lower total power by confining an individual read or write operation to one bank or a limited number of banks. This will allow the memory to only turn on a small number of driver, amplifier, or pre-charge circuits at one time.
As the density of integrated circuit memories increases, the area of each individual bit storage element in the memory correspondingly decreases. These smaller bit cells are more vulnerable, and the vast number of bits increases the statistical probability that there will exist among the vast array of bit elements individual memory cells that due to manufacturing variations and other environmental effects may not retain their memory values as well as the average memory cell in the array. It is therefore possible that during the normal course of operation these weak array elements will flip to the opposite memory state and corrupt the data stored in the memory.
It is also a well documented fact that bits can be flipped by the rare occurrence of a terrestrial cosmic ray that interacts with a memory cell. When the memory is used in a harsh environment such as high altitude or even space environments, the problem is greatly exacerbated as the number of cosmic ray interactions increases dramatically. Therefore, it is a common practice within the semiconductor memory industry to include Error Detection And Correction (EDAC) circuitry within a memory. Multiple EDAC techniques are known in the art. These include repetition codes, parity bits, checksums, cyclic redundancy checks, cryptographic hash functions, error correcting codes, automatic repeat request, and hybrid schemes. Indeed, error correcting codes include GCH code, Constant-weight code, convolutional code, group codes, Golay codes, Goppa code, Hagelbarger code, Hamming code, Lexicographic code, low-density parity check code, LT code, Raptor code, Reed-Solomon code, Reed-Muller code, Tornado code, turbo code, and the like. Generally such techniques use additional check bits. These check bits hold redundant information that can be used to correct the memory when one or more bits in a data word is corrupted.
According to one method of EDAC known in the art, data to be stored is provided to the EDAC. The EDAC then generates check bits based on the data value. The check bits are then stored along with the data or, as in this example, the check bits are used along with the data to form a code word which is then stored along with the data. To check the data, the EDAC reads the code word from the memory and recalculates the check bits based on the data portion of the code word. The recalculated check bits are then compared to the check bits in the code word to determine if there is a match. If a match exists, then the data in the code word is correct; if not, an error exists.
In other methods, the check bits themselves are stored along with the data and are used to determine whether an error exists, and if possible, correct the error. However, the effectiveness of these EDAC circuits is limited by the fact that when more than one bit in a given data word is corrupted (a condition known as a multi-bit error (MBE) or multi-bit upsets (MBU)), an increasingly large number of check bits are required in order to contain enough redundant information to correct the errors. For example, a Single Error Correcting and Double Error Detecting (SECDED) hamming code requires m parity bits for every 2m-m-1 data bits.
When MBEs are common, the number of redundant check bits included on the chip may need to be equal to or even greater than the number of actual memory bits and this overhead will limit the achievable density of the memory device. Moreover, when a cosmic ray interacts with a dense memory device, a single interaction will likely flip multiple memory bits. Since memories are constructed in ordered arrays of rows and columns, it is likely that even a single cosmic ray or heavy ion strike could cause a MBE.
A variety of EDAC techniques and circuits are available, but to correct more than one error in a word, a large overhead of check bits is required. Thus, the memory must grow substantially to provide storage for the increasing number of check bits. Eventually, such growth is prohibitive, thus limiting the memory's ability to self correct. Moreover, there is a performance penalty for multiple word bit correction. An EDAC designed to correct multiple errors is more complex and introduces more latency into the memory access time than those that correct fewer errors. In some applications such as telecommunications, this latency does not affect performance (for example, in a radio communication, it is not significant that a transmission is delayed by a fraction of a second to correct errors); in other applications, performance on a cycle-by-cycle basis is critical. Thus, the latency of error correction during a single cycle can be significant hit on performance.
Therefore, a need exists for an efficient method to detect and correct multiple errors in a semiconductor memory. Further, a need exists to develop an integrated method of error detection and correction that minimizes overhead yet improves memory reliability. These and other limitations of the prior art are resolved by one or more embodiments of the present invention, described hereafter by way of example.