1. Field of the Invention
This invention relates to error detection and correction and, more particularly, to error codes that detect and correct bit errors in computer memory systems.
2. Description of the Relevant Art
Error codes are commonly used in computer systems to detect and/or correct data errors, such as transmission errors or storage errors. For example, error codes may be used to detect and correct errors of data transmitted via a telephone line, a radio transmitter or a compact disk laser. Another common use of error codes is to detect and correct errors within data that are stored and read from a memory of a computer system. For example, error correction bits, or check bits, may be generated for data prior to storing data to one or more memory devices. When the data are read from the memory device, the check bits may be used to detect or correct errors within the data. Errors may be introduced either due to faulty components or noise within the computer system. Faulty components may include faulty memory devices or faulty data paths between devices within the computer system, such as faulty pins.
Hamming codes are one commonly used error code. The check bits in a Hamming code are parity bits for portions of the data bits. Each check bit provides the parity for a unique subset of the data bits. If an error occurs, i.e. one or more bits change state, one or more syndrome bits will be asserted (assuming the error is within the class of errors covered by the code). Generally speaking, syndrome bits are generated by regenerating the check bits and comparing the regenerated check bits to the original check bits. If the regenerated check bits differ from the original check bits, an error has occurred and one or more syndrome bits will be asserted. Which syndrome bits are asserted may also be used to determine which data bit changes state, and enable the correction of the error. For example, if one data bit changes state, this data bit will modify one or more check bits. Because each data bit contributes to a unique group of check bits, the check bits that are modified will identify the data bit that changed state. The error may be corrected by inverting the bit identified to be erroneous.
One common use of Hamming codes is to correct single bit errors within a group of data. Generally speaking, the number of check bits must be large enough such that 2kxe2x88x921 is greater than or equal to n, where k is the number of check bits and n is the number of data bits plus the number of check bits. Accordingly, seven check bits are required to implement a single error correcting Hamming code for 64 data bits. A single error correcting Hamming code is able to detect and correct a single error. The error detection capability of the code may be increased by adding an additional check bit. The use of an additional check bit allows the Hamming code to detect double bit errors and correct single bit errors. The addition of a bit to increase the data detection capabilities of a Hamming code is referred to as an extended Hamming code. Extended Hamming codes are discussed in more detail below.
Component failures are one problem that arises in computer memory systems. A component failure may introduce multiple errors that are uncorrectable by the error code. For example, if eight bits of a block of data are stored in the same memory device, the failure of the memory device may introduce eight bit errors into that block of data. Accordingly, one component failure may introduce a sufficient number of errors that the error correction code is not able to detect or correct the error. Likewise, a data path failure between a memory component and error correction circuitry, such as a pin failure, may introduce multiple errors into a block of data for which the error correction code is used.
One potential solution to prevent a component error from introducing multiple errors into a group of data is to store the data such that only one bit of data within the group is affected by any one component. For example, in a group of data with 64 data bits and 7 check bits, each bit of data may be stored in a different memory device. In this embodiment, 71 memory chips are required. Each memory device would store one bit of the 71-bit data group. Unfortunately, allocating bits to a group of data based on the number of data bits and check bits may not optimize the use of check bits within the system.
It is a common design goal of computer systems to reduce the number of check bits used to detect and correct errors. The check bits increase the amount of data handled by the system, which increases the number of memory components, data traces and other circuitry. Further, the increased number of bits increases the probability of an error. Although the check bits may make an error detectable and/or correctable, increasing the number of data bits within the system increases the probability of an error occurring. For at least these reasons, it is desirable to decrease the number of check bits for a given level of error detection and/or correction. It is further desired to increase the error correcting capability of a single error correcting code with a minimal number of additional bits.
The problems outlined above are in large part solved by a technique for partitioning data to correct memory part failures in accordance with the present invention. The data bits are assigned to a plurality of logical groups such that at most one bit corresponding to a component is assigned to a logical group. A bit may correspond to more than one component. For example, a bit may be stored in a memory device and may be transferred on a data pin. Accordingly, this bit corresponds to the memory device in which it is stored and the pin on which it is transferred. The assignment of at most one bit per component insures that a component failure may introduce at most one bit error to a logical group.
Unlike traditional systems in which a number of bits in a logical group is determined solely by the number of different components, logical groups in the present invention are selected to reduce the number of check bits for a given number of data bits. For example, the use of 57 data bits and 6 check bits is an optimal implementation of a single error correcting Hamming code. Accordingly, the logical groups may be assigned 63 bits each. In this manner, the number of check bits to detect and correct errors may be reduced. Error correction may be performed within each logical group to correct single errors within the logical group. As discussed above, because each logical group is assigned at most one bit corresponding to each component, component failures may be detected and corrected.
Broadly speaking, the present invention contemplates a method of detecting errors in a data block of a computer system that includes a plurality of components, the method comprises: assigning the bits of the data block to a plurality of logical groups such that at most one bit corresponding to a component is assigned to a logical group; performing error detection on the logical groups, wherein the logical group includes data bits and check bits; wherein a size of a logical group is selected to decrease a number of check bits in the data block.
The present invention further comprises a method of detecting errors in a data block of a computer system that includes a plurality of first components and a plurality of second components wherein bits of the data block are assigned to a first component and a second component, the method comprises: assigning the bits to a plurality of logical groups such that at most one bit from a first component is assigned to a first logical group and at most one bit from a second component is assigned to the first logical group; performing error detection on the first logical group, wherein the first logical group includes data bits and check bits; wherein a size of a first logical group is selected to decrease a number of check bits in the data block.
The present invention still further contemplates a system memory including a plurality of memory devices configured to store a data block, a plurality of pins configured to transfer bits of the data block and an error detection circuit coupled to the memory devices and the pins. Each bit of the data block is assigned to one of the plurality of memory devices and one of the pluralities of pins. The error detection circuit is configured to generate check bits for data stored in the plurality of memory devices and transferred by the plurality of pins, wherein the check bits are stored in the plurality of memory devices and transferred by the plurality of pins. The error detection circuit generates the check bits for a logical group of data prior to transferring and storing the data, and verifies the check bits for the logical group after storage and transfer. A logical group includes at most one bit assigned to a memory device and at most one bit assigned to a pin, wherein a size of a logical group is selected to optimize a number of check bits to a number of data bits.
The present invention still further contemplates a computer system including a processor, a bus, and a memory system. The memory system includes a plurality of memory devices configured to store a data block, a plurality of pins configured to transfer bits of the data block and an error detection circuit coupled to the memory devices and the pins. Each bit of the data block is assigned to one of the plurality of memory devices and one of the plurality of pins. The error detection circuit is configured to generate check bits for data stored in the plurality of memory devices and transferred by the plurality of pins, wherein the check bits are stored in the plurality of memory devices and transferred by the plurality of pins. The error detection circuit generates the check bits for a logical group of data prior to transferring and storing the data, and verifies the check bits for the logical group after storage and transfer. A logical group includes at most one bit assigned to a memory device and at most one bit assigned to a pin, wherein a size of a logical group is selected to optimize a number of check bits to a number of data bits.