This invention relates generally to an apparatus and method for performing error detection and correction in a memory system and more particularly to an apparatus and method for performing single symbol correction and double symbol detection (SSC-DSD) of data error and single symbol detection and double symbol detection (SSD-DSD) of addressing errors using a modified Reed Solomon code.
When fault-tolerance is required in memory system architectures, it is necessary to include some type of error coding in the stored information. The error code is required to provide a means of detecting or detecting and correcting errors in the information that are generated from faults incurred by memory data and addressing errors. The primary responsibility of the code is to protect against soft or transient errors. Transient errors in memory are typically orders of magnitude more frequent than errors due to hard or to permanent faults. While the geometries of the memory cells in semiconductor memory devices are designed to minimize the chance of multiple upsets from a single transient event, increasing memory densities and advanced packaging techniques tend to counteract this precaution. This factor, combined with requirements to operate in progressively higher Single Event Upset (SEU) rate environments make the chances of multiple bit errors in a symbol from a single event significant where a symbol comprises multiple bits. Permanent failures of memory devices may take many forms, but the dominant failure mode is the whole-chip-failure where the entire device is lost.
For memory architectures that use "by-one" device organizations (e.g., 256K .times.1 chips), the requirements on the code for protecting against multiple failure events are relaxed. Since an error code covers only a word of information at a time, even multiple events in a single device affect only one bit in the codeword. Similarly, whole-chip-failures are seen by the codeword as a single bit error. These requirements can be satisfied by conventional Hamming codes such as the Single Error Correcting-Double Error Detecting (SEC-DED) codes. When memory device organizations are used which are wider than "by-one", conventional Hamming codes can be easily foiled by multiple bit errors in a symbol and whole-chip-failures. For these cases, it is necessary to use a code which detects burst errors up to the width of the device as well as whole-chip-failures. This type of code performance may be better satisfied by a Reed Solomon code.
In U.S. Pat. No. 4,928,280, issued May 22, 1990, to Marlin A. Nielson et al. and assigned to IBM Corporation, it is noted that error correcting cyclic codes have the advantage that they can be simply implemented with shift registers, exclusive-OR gates, and feedback connections. They are based on an underlying algebraic structure that makes analysis simple and aids the design of encoders and decoders. Cyclic codes can be used for corrections of random errors and burst errors and are considered the most popular and the most useful. Cyclic codes that have proved very efficient are the BCH codes, named for their discoverers, Bose, Chaudhari, and Hocguenghem. One of the BCH codes is called the Reed Solomon code.
Nielson et al. further point out that cyclic codes are produced by multiplying the symbols of the source data by a generator polynomial and dividing the received words by the same polynomial. A polynomial is a code vector. If the remainder after division is zero, the received words of the destination data contain no errors. If the remainder is not zero, then an error occurred and the remainder can be used in some cases to correct the errors. If the source data is sent without modification, then the product of the multiplication by the generator polynomial is appended to the end of the source data. These extra symbols are called check symbols. The remainder symbols resulting from division o the destination data by the generator polynomial are called syndromes. If the syndromes are zero-valued, then no error occurred between source and destination.
A Galois Field with 2.sup.m symbols, denoted by GF(2.sup.m), are used as coefficients in a Reed Solomon code. In general, there are two coefficients (typically 0 and 1) and an m-degree polynomial, P(.alpha.). For m =5, P(.alpha.) is chosen so that all 2.sup.m -1 symbols will be different. So, P(.alpha.) = .alpha..sup.5 + .alpha..sup.2 +1.
In U.S. Pat. No. 4,861,193, issued Oct. 2, 1990, to Pierre Debord et al. and assigned to IBM Corporation, an apparatus and method is disclosed for correcting data words from a memory in which coded data is divided into a plurality of multi-bit packages. An error correcting code is employed that is able to correct at least one error in one package having suffered at least one hard failure and correct a single soft error in a different package. In U.S. Pat. No. 4,661,955, issued Apr. 28, 1987, to David L. Arlington et al. and assigned to IBM, Corporation, other examples of package codes are cited along with their hardware implementation.
The detection and correction of hard and soft errors in memories has been concerned with errors in the data words retrieved from such memories; there has been no attempt to detect addressing errors with the error code in addition to the data errors.
In U.S. Pat. No. 4,142,174, issued Feb. 27, 1979, to Chin L. Chen et al., and assigned to IBM Corporation, a high speed decoding scheme for Reed-Solomon codes is disclosed capable of correcting up to three symbol errors in codewords made up of data and check symbols but it does not anticipate the present invention.
In the prior art of error detection and correction it has been necessary to overcome a mismatch of the number of bits needed in the Reed Solomon code symbols (both data and parity symbols) compared to the bit width of a memory chip. In U.S. Pat. No. 4,862,463, issued Aug. 29, 1990, to Chin-Long Chen and assigned to IBM Corporation, an approach to the problem of matching the number of bits in a code symbol to the width of the memory chip is disclosed. In particular, Chen modifies the code to require fewer bits per symbol. However, this approach results in a mapping of bit positions to chips as shown in FIG. 3 of each patent where adjacent bits of a memory word do not map into the same memory chip as is done in the present invention.