1. Field of the Invention
This invention relates generally to error detection and correction techniques for use with a memory module which includes a memory buffer that serves as an interface between a host controller and the RAM chips residing on the module.
2. Description of the Related Art
Data stored in memory storage devices may be subjected to electronic, electromagnetic or other forms of interference that can corrupt or change the stored value. Consequently, memory systems designed for reliable operation have been constructed with error detection and correction capabilities to detect and correct for errors that occur in the storage devices. However, these capabilities typically require a considerable amount of overhead, in terms of access latency and the calculation and storage of parity bits, for example. In such memory systems, when memory is accessed, the stored parity bits are returned along with the data; the host memory controller then uses the parity bits to detect and correct errors in the retrieved data.
Host memory controllers designed to support error correction for low latency applications typically limit the error correction capability to the detection and correction of a “single error”, and the detection of two or more errors within a data word. A “single error” may be in the form of a single bit, or in certain cases, the failure of a single memory device that results in the failure of multiple bits in adjacent locations within a data word. Host controllers of this sort typically do not support the correction of “multi-errors”—i.e., errors in the form of multiple bits or the failure of multiple memory devices—assuming that the memory storage devices are highly reliable and that storage errors occur only as a result of low probability interference that corrupts the values stored in the memory device.
One trend in the development of memory storage devices is that as the storage cells continue to shrink due to advancements in process technology, the storage cells may be more susceptible to interference that corrupts the stored values. Consequently, multi-errors within a single data word may occur with higher degrees of probability in future memory systems. However, conventional methods of correcting such errors typically require significant changes to the host controller and the system infrastructure, and negatively impact the storage overhead and access latency characteristics of the memory system.
Contemporary memory storage devices such as SDRAM, DDR SDRAM, DDR2 SDRAM, and DDR3 SDRAM devices are often organized and used in existing computing systems as main memory. Computing systems that utilize these memory storage devices as main memory may be broadly classified into three categories according to their error detection and correction requirements:                Systems that do not detect or correct for data errors.        Systems that detect and correct for single errors and detect but do not correct multi-errors.        Systems that detect and correct multi-errors.In general, systems that detect and correct for multi-errors have high reliability requirements, and can tolerate the longer access latencies associated with encoding and storing the parity bits required for multi-error detection and correction algorithms. Systems that implement multi-error detection and correction are better able to handle errors that result from memory storage devices that have a higher probability of returning erroneous data.        