This invention relates, in general, to computer error correction codes and, in particular, to detecting address faults in an error correction code-protected memory.
The small size of computer transistors and capacitors, combined with transient electrical and electromagnetic phenomena cause occasional errors in stored information in computer memory systems. Therefore, even well-designed and generally reliable memory systems are susceptible to memory device failures.
In an effort to minimize the effects of these memory device failures, various error checking schemes have been developed to detect, and in some cases correct, errors in messages read from memory. The simplest error detection scheme is the parity bit. A parity bit is an extra bit included with a binary data message or data word to make the total number of 1""s in the message either odd or even. For xe2x80x9ceven parityxe2x80x9d systems, the parity bit is set to make the total number of 1""s in the message even. For xe2x80x9codd parityxe2x80x9d systems, the parity bit is set to make the total number of 1""s in the message odd. For example, in a system utilizing odd parity, a message having two 1""s would have its parity bit set to 1, thereby making the total number of 1""s odd. Then, the message including the parity bit is transmitted and subsequently checked at the receiving end for errors. An error results if the parity of the data bits in the message does not correspond to the parity bit transmitted. As a result, single bit errors can be detected. However, since there is no way to detect which particular bit is in error, correction is not possible. Furthermore, if two or any even number of bits are in error, the parity will be correct and no error will be detected. Parity therefore is capable of detecting only odd numbers of errors and is not capable of correcting any bits determined to be in error.
Error correction codes (ECCs) have thus been developed to not only detect but also correct bits determined to be in error. ECCs utilize multiple parity check bits stored with the data message in memory. Each check bit is a parity bit for a group of bits in the data message. When the message is read from memory, the parity of each group, including the check bit, is evaluated. If the parity is correct for all of the groups, it signifies that no detectable error has occurred. If one or more of the newly generated parity values are incorrect, a unique pattern called a syndrome results which may be used to identify the bit in error. Upon detection of the particular bit in error, the error may be corrected by complementing the erroneous bit.
A widely used type of ECC utilized in error control in digital systems is based on the codes devised by R.W. Hamming, and thus take the name xe2x80x9cHamming codesxe2x80x9d. One particular subclass of Hamming codes includes the single error correcting and double error detecting (SEC-DED) codes. As their name suggests, these codes may be utilized not only to correct any single bit error but also to detect double bit errors.
Another type of well-known ECC is the single symbol correction and double symbol detection (SSC-DSD) codes which are used to correct single symbol errors and detect double symbol errors. In systems implementing these types of codes, the symbol represents a multiple bit package or chip. Hence, as the name implies, an SSC-DSD code in a system utilizing n bit symbols would be capable of correcting n bits in a single symbol and detecting errors occurring in double symbols.
One limitation of these two well-known ECCs is their inability to identify address failures or address faults. An address failure or fault occurs when data is either stored to or retrieved from an erroneous memory location. For example, an address fault occurs when data intended to be stored at one location is mistakenly stored at another location. Similarly, data intended to be fetched from one location may be mistakenly fetched or retrieved from another location. The above-discussed ECCs protect only against memory data failures and hence neither are capable of detecting or identifying address faults.
One technique used to identify these address faults is to transmit parity along with the address bits. In this manner, address failures may be detected by a discrepancy in the parity. However, this technique suffers from the inability to detect address faults occurring after the parity has been stripped off within the memory chips.
Another technique is to store extra address or parity bits within the memory along with the data. Although, this technique may be used to detect address failures occurring after the parity bits have been stripped off within the memory chips, additional memory circuitry and hardware are required to implement this technique. Furthermore, techniques of this type are incapable of distinguishing between address faults and other types of uncorrectable errors, for instance, multiple symbol memory data failures.
Thus, a need exists for an address fault protection scheme capable of detecting address faults which does not require address parity bits to be stored into memory. In addition, a need also exists for an address fault protection scheme capable of distinguishing between address faults and other types of uncorrectable errors.
The shortcomings of the prior art are overcome and additional advantages are provided through the provision of an address fault detection capability for detecting address faults in an ECC-protected memory. In one example, a method of identifying address faults includes: detecting an uncorrectable error during transmission of a data word; and determining whether the uncorrectable error is an address fault.
In an enhanced embodiment of the invention, the uncorrectable errors are detected according to an error correction code which is generated according to a H-matrix. In one example, the H-matrix comprises a plurality of subsets arranged in a plurality of rows and columns, wherein each of at least one row of the plurality of rows comprises, in part, multiple iterations of one subset of the plurality of subsets, and a remainder of the plurality of rows comprises, in part, a cyclic permutation of all remaining subsets of the plurality of subsets.
In another enhanced embodiment, the invention includes isolating the address fault to a subgroup of address bits of a group of address bits corresponding to an address to which the data word was intended to be transmitted, wherein the subgroup of address bits contains at least one faulty bit.
In yet another enhanced embodiment, the invention distinguishes between address faults and other types of uncorrectable errors. In particular, these other types of uncorrectable errors may be, for instance, memory data failures.
In another example, a system for identifying address faults includes: means for detecting uncorrectable errors during transmission of a data word; and means for determining whether the uncorrectable error is an address fault.
In yet another example, a system for identifying address faults includes: a controller adapted to detect uncorrectable errors during transmission of a data word; and determine whether said uncorrectable error is an address fault.
In still yet another example, an article of manufacture comprises a computer usable medium having computer readable program code means embodied therein for causing the identifying of address faults. The computer readable program code means includes: computer readable program code means for detecting an uncorrectable error during transmission of a data word; and computer readable program code means for determining whether the uncorrectable error is an address fault.
Thus described herein is a technique for identifying address faults in an error correction code-protected memory. This technique first detects any uncorrectable errors occurring during the transmission of a data word. Then, any address faults are identified from among the detected uncorrectable errors. In this manner, address faults as well as uncorrectable memory data failures are detected. In addition, the present invention identifies address faults without requiring any address parity bits to be stored to memory.
Additional features and advantages are realized through the techniques of the present invention. Other embodiments and aspects of the invention are described in detail herein and are considered a part of the claimed invention.