1. Field of the Invention
This invention generally relates to computer memory systems, and more specifically, to content protection of computer memory using redundancy.
2. Background Art
The small size of computer transistors and capacitors, combined with transient electrical and electromagnetic phenomena, cause occasional errors in stored information in computer memory systems. Therefore, even well designed and generally reliable memory systems are susceptible to memory device failures.
In an effort to minimize the effects of these memory device failures, various error-checking schemes have been developed to detect, and in some cases correct, errors in messages read from memory. Many of these checking schemes use redundant information, stored in the computer memory, to ensure data integrity. The simplest error detection scheme is the parity bit. A parity bit is an extra bit included with a binary data message or data word to make the total number of 1's in the message either odd or even. For “even parity” systems, the parity bit is set to make the total number of 1's in the message even. For “odd parity” systems, the parity bit is set to make the total number of 1's in the message odd. For example, in a system utilizing odd parity, a message having two 1's would have its parity bit set to 1, thereby making the total number of 1's odd. Then, the message including the parity bit is transmitted and subsequently checked at the receiving end for errors. An error results if the parity of the data bits in the message does not correspond to the parity bit transmitted. As a result, single bit errors can be detected. However, since there is no way to detect which particular bit is in error, correction is not possible. Furthermore, if two or any even number of bits are in error, the parity will be correct and no error will be detected. Parity therefore is capable of detecting only odd numbers of errors and is not capable of correcting any bits determined to be in error.
Error correction codes (ECCs) have thus been developed to not only detect but also correct bits determined to be in error. ECCs utilize multiple parity check bits stored with the data message in memory. Each check bit is a parity bit for a group of bits in the data message. When the message is read from memory, the parity of each group, including the check bit, is evaluated. If the parity is correct for all of the groups, it signifies that no detectable error has occurred. If one or more of the newly generated parity values are incorrect, a unique pattern called a syndrome results, which may be used to identify the bit in error. Upon detection of the particular bit in error, the error may be corrected by complementing the erroneous bit.
A widely used type of ECC utilized in error control in digital systems is based on the codes devised by R. W. Hamming, and thus take the name “Hamming codes”. One particular subclass of Hamming codes includes the single error correcting and double error detecting (SEC-DED) codes. As their name suggests, these codes may be utilized not only to correct any single bit error but also to detect double bit errors.
Another type of well-known ECC is the single symbol correction and double symbol detection (SSC-DSD) codes, which are used to correct single symbol errors and detect double symbol errors. In systems implementing these types of codes, the symbol represents a multiple bit package or chip. Hence, as the name implies, an SSC-DSD code in a system utilizing n bit symbols would be capable of correcting n bits in a single symbol and detecting errors occurring in double symbols.
Error detecting codes have a low overhead, e.g., 12.5% parity overhead for single byte. Error correcting codes are very inefficient for small data items and are usually used for groups of 8 bytes and larger, e.g., 12.5% overhead for single error correcting, double correcting code on 8 bytes. If a fraction of the group is changed, the unchanged data needs to be retrieved to generate the ECC for the entire group, causing expensive Read-Modify Write cycles.
For example, in a 32-bit ECC scheme, the check bits that are stored with the data are generated based on the entire thirty-two bits. This makes it necessary to regenerate all of the check bits if even one data bit has changed. Thus, if one byte of data needs to be written to memory, the entire 4-byte double word must first be read, checked and corrected, the new eight bits substituted, and then all four bytes must be rewritten to memory with the appropriate new check bits. The same is true if two or three bytes of data need to be written to memory. This is called a partial write or a read/modify/write operation.
A large number of these Read-Modify-Write cycles can cause significant delays in the operation of the memory system. This problem is usually mitigated by implementing write-combine buffers. These buffers collect multiple update requests and combine them, if possible, into larger updates, possibly changing the entire ECC protected group at once.