Contained herein is material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction of the patent disclosure by any person as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all rights to the copyright whatsoever.
The present invention relates to memory systems; more particularly, the present invention relates to correcting single device failures and detecting two device failures in a memory system.
A Direct Rambus Dynamic Random Access Memory (Direct RDRAM) developed by Rambus, Inc., of Mountain View, Calif., is a type of memory that permits data transfer operations at high speeds over a narrow channel, e.g. up to 1.2-1.6 gigabytes per second over an 18-bit wide channel. RDRAM devices are typically housed in Rambus in-line memory modules (RIMMs) that are coupled to one or more Rambus channels. Typically, the channels couple each RDRAM device to a memory controller. The memory controller enables other devices, such as a Central Processing Unit (CPU), to access the RDRAMs.
In order to assure the correctness of the RDRAM devices, error-correcting codes may be stored with the data. The most common error codes used are referred to as single error correctionxe2x80x94double error detection (SEC-DED). As the name implies, such codes enable the correction of a single-bit error and the detection of double-bit errors.
For some systems, error coverage beyond that provided by a SEC-DED code may be required. For example, a system may require that the failure of an entire memory device be detected and corrected, and the failure of two memory devices be detected. To correct the failure of a single device, a code word is spread over several devices. For example if the code word is 72xc3x9718 bits long (e.g., 72 memory devices, 18 bits each), 18 interleaved SEC-DED codes provide correction of a single device failure and detection of a double device failure. However, this approach is not practical because of the large number of devices that must be accessed in parallel and the large amount of data per access (72 devicesxc3x9716 bytes per device per access=1152 bytes).
Normally, the number of devices that can be accessed in parallel in a computer system utilizing an RDRAM memory system is relatively small (e.g., 2, 4 or 8). As a result, there are not enough check bits in an RDRAM to enable device failure detection and correction using previously available techniques. Therefore, a new method that enables detecting and correcting device failures in an RDRAM memory system is desired.