Substantially all modern electronic computers rely on semiconductor memory to store data for processing by a central processing unit (CPU). Computers employing semiconductor memory vary from simple computers, such as those contained in telephone answering machines, to highly complex supercomputers employed for complicated scientific projects. In simple computers like those used for telephone answering machines, errors in one or more of the memory locations of the memory may not be fatal. For example, a mistake in the memory of the telephone answering machine likely would only cause the synthesized voice stored on the memory to be imperceptibly altered. However, one or more defective memory locations in a memory of a computer used to perform scientific calculations may cause substantial problems.
Although current manufacturing techniques have substantially reduced the number of defective memory locations, excessive numbers of defective memory locations are still sometimes produced during fabrication of computer memory. Those defective memory locations can be caused by any of numerous steps taken during manufacture of the memory chips, semiconductor crystallinity defects, electrical connector discontinuities, etc. Although memory chips with such defective memory locations typically represent a small portion (less than 1%) of the total number of memory chips produced, the actual number of such defective memory chips is substantial. In some cases, such defective memory chips can be sold at a greatly reduced price for applications that do not require perfect memory, such as for telephone answering machines. However, it would be beneficial if some of those memory chips could be employed in more critical applications, such as in personal computers.
Several prior art error handling schemes have been employed to compensate for defective memory locations. For example, one error handling scheme employs extra rows of memory cells, known as "redundant rows," that could be used to replace rows having defective memory cells. While the use of redundant rows is often successful in salvaging otherwise defective memory chips, the number of defective rows that can be replaced is limited to the number of redundant rows that are provided on the memory chip. The number of defective rows sometimes exceeds the number of redundant rows, thus preventing repair of some defective rows.
Other hardware techniques have also been proposed to compensate for defective locations in memory devices. Some of these techniques involve maintaining a record of defective memory locations and then redirecting accesses to these locations to memory locations that are known to be functioning properly. However, these solutions can require excessive hardware overhead, thus precluding these solutions from being cost effective.
Another prior art error handling scheme, known as error detection, detects when a single bit of a data word is in error. Error detection typically adds a single parity bit to each data word written to memory in order to make the sum of the data word and the parity be an even number. If the sum of the data word and the parity bit is an odd number when the data word is read, then the error detection scheme determines that one of the bits of the data word is in error. Such parity-based error detection often is inadequate because only single bit errors are detected, the particular bit in error is not identified, and the particular bit in error is not corrected.
Yet another error handling scheme, known as error correction, overcomes some of the deficiencies in prior art error detection schemes. Prior art correction schemes add to each data word an error correction code having plural error correction bits that enable the data word to be reconstituted in the event of an erroneous data bit within the data word. Commonly used error correcting codes are Hamming codes which append error detecting and correcting "syndrome bits" to a data word. The number of syndrome bits that are required depends upon the number of bits in the data word. For example, a 64-bit data word requires 8 syndrome bits to detect two error bits and correct one error bit. Additional error bits can be detected and corrected by using additional syndrome bits. However, the number of syndrome bits grows rapidly with increases in the number of erroneous bits. For example, 22 syndrome bits are required to correct 4 erroneous bits in a 64-bit word. In operation, the syndrome bits are stored in a memory along with the data word. The syndrome bits are read from the memory along with the data word, and the data word and syndrome bits are processed using a conventional algorithm to detect and then correct any error bits in the data word.
Although convention error correcting codes are adequate in some applications, they nevertheless exhibit serious limitations and disadvantages. For example, such prior art error correction schemes typically only reconstitute a data word if only a single bit of the data word is erroneous. Such single bit correction may be adequate when each data word includes only eight or sixteen data bits, but may be inadequate for the larger data words used in more advanced computer systems, such as computer systems based on Intel's Pentium Pro.TM. processor, which employ 64 bit data words. Such long data words have a much higher chance of having multiple data bits altered in error than eight or sixteen bit data words, and thus, single bit error detection may not provide the level of data protection desired by users of such advanced computer systems. This problem of multiple data bit errors is surprisingly common because defects in adjacent data bits in memory devices are quite common.
Another limitation of using conventional error correcting codes, such as Hamming codes, is that they can be used only for entire data words and thus cannot function for partial data words written to memory. Instead, error correcting codes can be used when writing partial words only by performing relatively complex and time consuming multiple memory accesses.
Still another disadvantage of memory correction techniques using conventional error correcting codes is the amount of memory that must be allocated to store the error correcting codes. This required memory, known as "memory overhead," reduces the capacity of memory devices thereby circumventing a major goal error correcting techniques, i.e., maximizing the storage capacity of memory devices containing defective memory locations. Also, a significant amount of logic circuitry is often required to generate the error codes during a memory write operation, and to decode the error codes during a memory read operation. This additional logic circuitry further increases the "hardware overhead" cost of this approach.
As a result of these limitation and disadvantages of using conventional error correcting codes, there is a need for a memory fault correction system that can correct partial data words read from defective memory locations, and that can correct a relatively large number of bits read from defective memory locations using relatively few error correcting bits thereby minimizing memory overhead.