This invention relates to error detection and correction, and more particularly to address error detection merged with data error detection and correction.
Digital memories are susceptible to errors caused by a variety of sources. Cosmic radiation can flip the state of individual memory cells. Pattern-sensitive capacitive coupling, noise, and hardware failures such as shorts can occur, causing multiple bits to be read incorrectly. Sometimes entire memory chips can fail. When a memory contains several memory chips, such as on a memory module, a one-chip failure may produce a multi-bit error, such as a 4-bit error in a 72-bit memory word.
Additional bits are often included in the memory for storing an error-correction code (ECC). These additional ECC bits can be used to detect an error in the data bits being read, and can sometimes be used to correct those errors. Typically, a code is selected such that the data is unmodified. All error detection and correction is done by comparing the check bits read against the correct check bits for that data. Such a code is considered in “systematic form”.
Various codes can be used for the ECC bits, such as the well known Hamming codes. A class of codes known as Single-byte Error-Correcting/Double-byte Error-Detecting (SbEC/DbED) codes can correct any number of errors within a “byte” and detect pairs of such errors. The “byte” may be a length other than 8 bits. For example, a S4EC/D4ED code can correct 4-bit (nibble) errors, and detect but not correct 8-bit (2 nibble) errors. These codes are especially useful since they can detect double-chip errors where all 4 bits output by a two different memory chips are faulty. Single-chip errors can be corrected.
A SbEC/DbED code with 3*b check bits can be used with up to b*(2**b+2) total bits (data+check). These are known as Reed-Solomon SbEC-DbED codes. When b=4, only a relatively small a number of data bits can be used (60). To increase the allowed number of data bits, 4*b check bits are typically used, such as 128 data bits with 16 check bits. The increased number of check bits allows a larger number of data bits to be used.
While such S4EC/D4ED codes are useful for protecting against failures in whole memory chips, and in the wires to and from the memory chips, failures can also occur in the address lines to one or more of the memory chips. For example, a solder connection to an address pin of one of the memory chips might start failing after some time. Many memory chips use multiplexed addresses, where the address is applied over the same address lines in two parts, a row address part and a column address part. A single solder connection can thus cause two bits of the address to be faulty. It is desirable to protect against such 2-bit address errors. Some of the memory errors may be caused by cosmic radiation. This may cause a wrong address to be read from within the memory chip. This address may be wrong in an unknown number of bits.
As memory sizes increase, more and more address bits are used. Protecting these larger addresses against errors becomes more important.
FIG. 1 shows a prior-art memory with data ECC and address parity. Write data is stored in data RAM 10, while ECC generator 16 calculates the ECC bits that correspond to the value of the data bits being written into data RAM 10. These data ECC bits are written into data ECC RAM 12 at the same write-address W_ADR as the data.
During reading, the read address R_ADR is applied to read out data from data RAM 10 and data ECC bits from data ECC RAM 12. Read ECC generator 20 re-generates an ECC value from the data being read from data RAM 10. The new ECC value from read ECC generator 20 is compared to the stored ECC bits from data ECC RAM 12 by ECC checker 24 to determine if any errors occurred in the read data. A data error can be signaled when the stored ECC does not match the re-generated ECC. Some of these data errors may be corrected by an ECC corrector (not shown).
To protect against errors in the address, the write address W_ADR is applied to parity generator 18, which generates the parity of the write address. The generated address parity is then stored in address parity RAM 14 at the write address.
During reading, the stored address parity is read from address parity RAM 14, while the parity of the read address R_ADR is generated by read parity generator 22. The generated read-address parity is compared to the stored parity from address parity RAM 14 by parity comparator 26. When the parity values mis-match, and address error is signaled. The memory read can be re-tried several times before a failure is signaled.
FIG. 2 shows address parity concatenated with data ECC bits. The address parity and data ECC bits can be stored in separate RAMs, or can be concatenated and stored in the same RAM. A data word of 128 bits may need 16 data ECC bits to correct errors up to 4 bits in a nibble and to detect pairs of such errors in separate nibbles. A 32-bit address protected with a standard Hamming code would need 6 bits, allowing detection of all 1 and 2 bit errors in the address. Thus a total of 23 check bits are needed to protect against both address and data errors.
Some memories may lack a sufficient width to store all of the check bits. For example, there may only be space for 16 check bits. It may be undesirable to reduce the number of data ECC bits to fit in some address parity bits. There are trade-offs among the number of check bits and expense of the memory system, the largest multi-bit data error that can be corrected and detected, and the degree of detection of address errors. Adding additional check bits for the address parity is often undesirable. Reducing the number of address check bits can reduce detection for multi-bit address errors. The use of multiplexed address bits causes 2-bit address errors to be as likely as 1-bit address errors in a real system.
The address parity bits could be exclusive-OR'ed (XOR'ed) into the data ECC bits. This has the advantage of not requiring additional check bits. However, if the address has a parity error, the extracted data ECC bits may not be able to correct an otherwise correctable data error. Thus some data correction ability may be lost. This happens if the address error causes an error syndrome to be created that matches the error syndrome for an otherwise correctable data error.
What is desired is a memory with data error correction and detection and address error detection. It is desirable to combine address check bits with data ECC bits.