1. Field of the Invention
The present invention generally relates to error detection and correction in data read from memory or array chips and, more particularly, to the cooperative use of two facilities, for example, one on the chip to detect multiple bit errors and one off the chip to correct multiple bit errors. More generally, the invention contemplates a two level error detection/correction with the levels being chip level, board level or any other packaging level.
2. Description of the Prior Art
Error-correcting codes (ECCs) are used to enhance system reliability and data integrity of computer semiconductor memory systems. ECCs have proved to be a cost-effective means of maintaining a high level of system reliability. Early ECCs used in computer memory systems were single-error-correcting and double-error-detecting (SEC-DED) codes invented by R. W. Hamming as described in "Error Detecting and Error Correcting Codes", Bell System Technical Journal, April 1950, pp.147-160. U.S. Pat. No. 3,755,779 to Price teaches a basic SEC-DED method of error correction/detection. While only one error can be corrected with these codes, the double error detecting capability guards against data loss. M. Y. Hsiao in "A Class of Optimal Minimum Odd-Weight-Column SEC-DED Codes", IBM Journal of Research and Development, July 1970, pp. 395-401, disclosed a new class of SEC-DED codes which provided an improvement of Hamming codes in speed, cost and reliability of the decoding logic. The logic is offered by several semiconductor manufacturers such as, for example, the AM2960 and AMZ8160 of Advanced Micro Devices, the MC68540 of Motorola and the SN54/74 LS630 and LS631 of Texas Instruments.
An improvement on the basic SEC-DED code is disclosed by Bossen in U.S. Pat. No. 4,319,357. Bossen uses a SEC-DED code in a memory system to correct one fixed error and one transitory error in a data word. The erroneous data word and syndrome generated by the error correcting code circuitry are saved while the memory location of the flawed word is checked to determine the location of the one fixed error. A "syndrome" is then generated for the word assuming only a single fixed error in the location and, thereafter, the generated and saved syndromes are Exclusive ORed together to obtain another syndrome locating the position of the transitory error. With both errors located, the word is corrected by inverting the erroneous bits.
To increase the level of reliability of memory systems of increasing size and density, double-error-correcting, triple-error-detecting (DEC-TED) codes are used. Such codes can be constructed based on well known BCH coding theory as described, for example, by W. W. Peterson and E. J. Weldon, Jr., in Error Correcting Codes, MIT Press (1972). U.S. Pat. No. 4,464,753 to Chen discloses a modularized error correction apparatus for correcting package errors by expanding an N-bit SEC-DED code to cover N packages of M bits each such that the Exclusive OR of all M-bit single bit error syndromes in any given package results in a composite "syndrome" which is unique for each package. In U.S. Pat. No. 4,509,172, Chen expands on this by disclosing a DEC-TED code that uses syndromes developed by a parity check matrix H to perform error correction. Chen also teaches how to detect errors without applying miscorrection. U.S. Pat. No. 4,775,979 to Oka builds on the Chen approach and, like Chen, uses a parity check matrix H, but in addition to correcting random errors, Oka corrects a block error by adding a plurality of unit matrices to the parity check matrix.
DEC-TED codes, however, require a large number of check bits than a SEC-DED code and, correspondingly, more complex hardware to implement the functions of error correction and error detection. C. L. Chen and M. Y. Hsiao in "Error-Correcting Codes for Semiconductor Memory Applications: A State-of-the-Art Review", IBM Journal of Research and Development, vol. 28, no. 2, March 1984, pp. 124-134, describe four classes of error-correcting codes appropriate for semiconductor memory designs. For each class of codes, the number of check bits required for commonly used data lengths is provided. The implementation aspects of error correction and error detection are also discussed, and certain algorithms useful in extending the error-correcting capability for the correction of soft errors such as .alpha.-particle-induced errors are examined in some detail.
Another approach taken in the prior art is disclosed in U.S. Pat. No. 4,335,459 to Miller. Miller proposes "on chip" ECC logic to improve manufacturing yield and chip reliability. In this scheme, the user is not aware of the chip's increased internal storage and error correction circuitry, the only indication of these characteristics being the fact that the memory chips can be used without external ECC logic. Nevertheless, Miller suggests that his memory chip could be used with external ECC logic to provide multiple bit error correction in much the same manner that Bossen uses a SEC-DED code to correct more than one error. In this case, the "on-chip" and "off-chip" ECC logic functions independently.
In large processor conventional memory systems which use SEC-DED codes for accesses to memory, these memory systems generally are designed using a "by one" (.times.1) memory chip organization. For example, in the IBM 3090 family of computers, a 1 MByte .times.1 memory chip organization is employed. The SEC-DED code corrects all single chip failures and detects almost all multiple bit failures.
As memory chip densities increase (bits per chip) and/or as the need to have more memory interleaves is demanded (as in a multi-processor "super computer" design), then a multiple bit memory chip output will be required. For example, a chip organization of 512 KBytes.times.9 might be used. In such a system, it will still be desirable to correct any single memory chip failure (up to 9 bits) and detect multiple failures. Standard SEC-DED codes do not solve the problem. Other Error Checking and Correcting (ECC) codes have been designed to attack this problem; however, the cost to implement these codes in both logic and associated memory chip overhead is prohibitive.