1. Field of the Invention
The present invention generally relates to computer systems and, more particularly, to a method of implementing error correction codes in an error detection or correction device such as a memory controller.
2. Description of the Related Art
The basic structure of a conventional computer system 10 is shown in FIG. 1. The heart of computer system 10 is a central processing unit (CPU) or processor 12 which is connected to several peripheral devices, including input/output (I/O) devices 14 (such as a display monitor and keyboard) for the user interface, a permanent memory device 16 (such as a hard disk or floppy diskette) for storing the computer's operating system and user programs, and a temporary memory device 18 (such as dynamic random-access memory or DRAM) that is used by processor 12 to carry out program instructions. Processor 12 communicates with the peripheral devices by various means, including a bus 20 or a direct channel 22. Computer system 10 may have many additional components which are not shown, such as serial and parallel ports for connection to, e.g., modems or printers. Those skilled in the art will further appreciate that there are other components that might be used in conjunction with those shown in the block diagram of FIG. 1; for example, a display adapter connected to processor 12 might be used to control a video display monitor. Computer system 10 also includes firmware 24 whose primary purpose is to seek out and load an operating system from one of the peripherals (usually permanent memory device 16) whenever the computer is first turned on.
Parity checks and error correction codes (ECCs) are commonly used to ensure that data is properly transferred between system components. For example, a magnetic disk (permanent memory device) typically records not only information that comprises data to be retrieved for processing, but also records an error correction code for each file, which allows the processor, or a controller, to determine whether the data retrieved is valid. ECCs are also used with temporary memory devices, e.g., DRAM, and the ECC for files stored in DRAM can be analyzed by a memory controller which provides an interface between the processor and the DRAM array. If a memory cell fails during reading of a particular memory word (due to, e.g. stray radiation, electrostatic discharge, or a defective cell), then the failure can at least be detected. ECCs can further be used to reconstruct the proper data stream. See, e.g., U.S. Pat. Nos. 4,402,045, 4,538,270.
Some error correction codes can only be used to detect single-bit errors, i.e., if two or more bits in a particular memory word are invalid, then the ECC might not be able to determine what the proper data stream should actually be. Other ECCs are more sophisticated and allow detection or correction of double errors, and some ECCs further allow the memory word to be broken up into clusters of bits, or "symbols," which can then be analyzed for errors in more detail, such as the ECCs described in U.S. Pat. Nos. 4,359,772, 4,958,350 and 5,450,423. ECCs commonly use parity check matrices, as taught in U.S. Pat. No. 5,425,038.
Different computer systems may require different error correction routines depending upon system architecture. For example, the RS/6000 and AS/400 computer systems of International Business Machines Corporation have substantially different requirements for their memory card designs, due to both data word definition (64 bits versus 65 bits) and correction capability (single-bit correct per word vs. 2-adjacent-bit correct per word). There are valid architectural, engineering, and business justifications for these differences, but the differences lead to the requirement of providing different memory controllers for different systems. Thus, if a memory controller were designed to function with more than one architecture, it would have to possess two completely separate and redundant ECC implementations, selected by a mode bit.
For example, for a memory array having a "b-bit-per-chip" configuration, the proper ECC is one that is capable of correcting all single symbol errors and detecting all double-symbol errors, where a symbol error is any one of the 2.sup.b -1 error patterns generated from a failure of an array chip. Using this SSC-DSD code (single-symbol-correction, double-symbol-detection), the memory may continue to function as long as there is no more than one chip failure in the group of array chips covered by the same ECC word. All errors generated from a single chip failure are automatically corrected by the ECC, regardless of the failure mode of the chip. Sometime later, when a second chip in the same chip group fails, double-symbol errors may be present if the locations of the two chip failures line up in the same ECC words. These double-symbol errors would be detected by the ECC. To prevent a data loss in this case, a proper maintenance strategy is executed so that the number of symbol errors does not accumulate beyond two.
Another class of ECCs is the SEC-DED-SSD codes (single-error-correction, double-error-detection, single-symbol-detect) that are capable of correcting all single bit errors, detecting all double bit errors, and detecting all single symbol errors. SEC-DED-SSD codes are not as powerful in correcting and detecting symbol errors as SSC-DSD codes. However, an SEC-DED-SSD code requires a fewer number of check bits and, thus, a fewer number of redundant array chips, as compared to an SSC-DSD code with the same number of data bits.
A common memory control chip (or chips) may be used to support different memory subsystems, for economical reasons. Consider the case where memory subsystem A is configured in 2 bits per chip, and memory subsystem B is configured in 4 bits per chip. In addition, subsystem A is to use a (76,66) SSC-DSD code (76 total bits of information per code word, 66 bits of data, i.e., 10 check bits) with b=2, and subsystem B is to use a (72,64) SEC-DED-SSD code with b=4 (72 total bits of information per code word, 64 bits of data, i.e., 8 check bits). Present techniques require two separate ECC designs to be implemented in the common memory logic, making a more redundant (and expensive) system. It would, therefore, be desirable and advantageous to devise a method of incorporating two or more ECC implementations into a single design.