1. Technical Field
The present invention relates in general to memory configurations for computing systems, and in particular to fault detection. More specifically, the present invention relates to a fault tolerant memory system utilizing memory arrays with hard error detection and a method of operation thereof.
2. Description of the Related Art
Memory systems employed in conventional data processing systems, such as computer systems, typically include large arrays of physical memory cells that are utilized to store information in a binary manner. Generally in a conventional memory system, all of the memory cells on a memory chip are disposed in one or more memory arrays having a set number of rows and columns. Operatively, the rows are selected by row decoders that are typically located adjacent to the ends of the row lines. Each of the row lines is electrically connected to the row decoders so that the appropriate signals can be received and transmitted.
The columns of the memory array are connected to input/output (I/O) through column decode devices. In the case of dynamic random access memories (DRAMs), the memory array columns are also connected to line precharging circuits and sense amplifiers at the end of each column line to periodically sense amplify and restore data in the memory cells.
There are two kinds of errors that can typically occur in a memory system, soft errors and hard errors. A soft error is a seemingly random inversion of stored data. This inversion may be caused by occasional electrical noise, environmental conditions and, in some cases, by bombardment of radioactive particles, the so-called alpha particle event. The soft error problem has increased as the individual cell sizes of the memory arrays have been reduced increasing their susceptibility to relatively low amounts of noise. Although soft error failure rates are generally 2-3 times the order of magnitude higher than hard error failure rates in DRAM arrays, soft error failures typically only cause single bit errors in memory system words. A hard error, in contrast, represents a permanent electrical failure of the memory array, often restricted to particular memory locations but may also sometimes associated with peripheral circuitry of the memory array so that the entire array can be affected. Naturally, designers of memory arrays have strived to reduce the occurrence of both hard and soft errors in their memory arrays.
One solution for detecting and correcting both hard and soft errors has been the implementation of error correction codes (ECC) in large computer memories. The fundamentals of error detecting and correcting are described by R. W. Hamming in a technical article entitled xe2x80x9cError Detecting and Error Correcting Codesxe2x80x9d appearing in the Bell System Technical Journal, Volume 26, No. 2, 1950 at pages 147-160. Utilizing one of the most popular Hamming codes, an 8-bit data word is encoded to a 13-bit word according to a selected Hamming code. A decoder can process the 13-bit word and correct any 1 bit error in the 13 bits and can detect if there are 2-bit errors. The described code, thus, is classified as SEC/DED (single error correct/double error detect). The use of such codes has been particularly efficient for memory arrays having single-bit outputs. For instance, if a relatively simple computer were to have 16K (16,348) bytes of data where each byte contains 8 data bits, an efficient error-protected design would utilize thirteen 16Kxc3x971 memory arrays with the extra five 16K memory arrays providing a Hamming SEC/DED protection. The Hamming code not only can correct a single bit hard or soft random error occurring in any byte, but can also further correct any one failed 16K memory array since any one memory array contributes only 1 bit per each error-protected word.
The above-described 13-bit Hamming code can only correct one error, whether it is a hard error or a soft error. Consequently, if one memory array has suffered a hard failure in all its locations, then the remaining memory arrays are not protected against an occasional soft error although it could be detected but not corrected. To be able to detect and correct more than one error, more elaborate error correcting codes have been developed and implemented. However, as a general rule, the more errors that can be corrected in a word, the more extra check bits are required by the check code.
Presently, memory arrays typically contain 256 Mbit devices and the trend is towards production of memory arrays that will contain 1 Gbit within two to four years. With the anticipated increase in memory array sizes, the present approach of utilizing 1 or 4-bit wide memory chip organization must be reconsidered. For example, employing the present 1 or 4 bit memory chip organization with a 32 bit wide data word will require 32 memory arrays (1 bit organization) or 8 memory arrays (4 bit organization). This will, in turn, result in a minimum granularity, e.g., in a personal computer (PC), of 8 GB or 2 GB, respectively. This large amount of memory in a desktop or laptop computer is excessive and also has the added disadvantage of increasing the overall cost of the computer system. In response to the minimum granularity problem, memory array manufacturers are moving to 8, 16 and even 32 bit wide memory organization schemes with the corresponding increase in the number of check bits required for error detection and correction.
Unfortunately, Hamming codes require several check bits to accomplish the error detection and correction. As discussed above, an eight-bit data word requires five check bits to detect two-bit errors and correct one-bit errors. As the bus grows wider and the number of bits of transmitted data increases, the number of check bits required also increases. Because modern memory buses are often 64 or 128 bits wide, the associated Hamming code would require substantially more check bits and increasing levels of logic circuits to implement the error correction. Consequently, using powerful Hamming codes in large memory systems is expensive and consumes substantial memory resources.
Accordingly, what is needed in the art is an improved error detection and correction scheme that mitigates the above-described limitations in the prior art.
It is therefore an object of the invention to provide an improved memory system.
It is another object of the invention to provide a fault tolerant memory system utilizing memory arrays with hard error detection and a method of operation thereof.
To achieve the foregoing objects, and in accordance with the invention as embodied and broadly described herein, a fault tolerant memory system is disclosed. The fault tolerant memory system includes a number of memory arrays including at least one spare memory array, wherein each of the memory arrays has an internal error detection circuit. In an advantageous embodiment, the internal error detection circuit includes an inverter, a register coupled to the inverter and a comparator for comparing the contents of the inverter and register. The comparator will generate an error signal to indicate a failed memory array in response to the contents of the inverter and register not being equal. The fault tolerant memory system also includes data correction logic for correcting data stored in a failed memory array and, in an advantageous embodiment, restores xe2x80x9ccorruptedxe2x80x9d data in a failed array by reading the content of a row of cells in the failed memory array and generating a first complement of the content. Next, the first complement is written back to the row of cells, following which, the first complement is again read from the failed memory array and a second complement of the first complement is generated to restore the corrupted data to its original xe2x80x9cuncorruptedxe2x80x9d form. The fault tolerant memory system further includes means for replacing the failed memory array with a spare array.
The present invention discloses a novel fault tolerant (highly reliable) memory system utilizing memory arrays with greater than four-bit wide organization and internal error detection capabilities. The utilization of widely organized memory arrays with internal error detection capabilities allows a memory system to utilize a minimum number of memory arrays to satisfy small memory granularity requirements. The memory system of the present invention provides the minimum granularity and high performance that are required for devices such as personal computers, laptop and other small hand-held information devices.
The foregoing description has outlined, rather broadly, preferred and alternative features of the present invention so that those skilled in the art may better understand the detailed description of the invention that follows. Additional features of the invention will be described hereinafter that form the subject matter of the claims of the invention. Those skilled in the art should appreciate that they can readily use the disclosed conception and specific embodiment as a basis for designing or modifying other structure for carrying out the same purposes of the present invention. Those skilled in the art should also realize that such equivalent constructions do not depart from the spirit and scope of the invention in its broadest form.