The present invention relates to memory systems, and more particularly, to a memory system that detects errors and reconfigures itself to avoid bad memory cells.
As the cost of computational hardware has decreased, computers with ever-larger memory systems have proliferated. Systems with hundreds of Mbytes are common, and 15 systems with a few Gbytes of memory are commercially available. As the size of the memory increases, problems arising from bad memory cells become more common.
Memory failures may be divided into two categories, those resulting from bad memory cells that are detected at the time of manufacture and those that arise from cells that fail during the operation of the memory. At present, problems arising from defective memory cells that are detected during the manufacturing process are cured by replacing the bad cells. The typical memory array is divided into blocks. Each memory chip has a predetermined number of spare blocks fabricated thereon. If a block in the memory is found to have a defective memory cell, the block in question is disconnected from the appropriate bus and one of the spares is connected to the bus in its place. However, once the part is packaged, there is no means for replacing a block with a spare, since the replacement process requires hard wiring of the spares to the bus.
The cost of testing the memory chips is a significant factor in the cost of the chips. The rate at which memory cells can be tested is limited by the internal organization of the memory blocks and the speed of the buses that connect the memory blocks to the test equipment. The various buses are limited to speeds of a few hundred MHz. Data is typically written and read in blocks having 64 bits or less. Since a write operation followed by a read operation requires several clock cycles, the rate at which memory can be tested is limited to 100 million tests per second. Extensive testing requires each memory cell to be tested a large number of times under different conditions such as temperature and clock speed. Hence, a 1 Gbyte memory chip would require minutes, if not hours, to thoroughly test. The cost of such testing would be prohibitive; hence, prior art memory chip designs will not permit extensive testing at the 1 Gbyte level and beyond.
Even when the obviously bad memory blocks have been removed, sooner or later, the memory will fail because of the failure of one or more cells in a block. The probability that such a failure will cause a system failure depends on the lifetime of the system, the size of the memory, and the type of memory. The probability of such a failure increases with the lifetime of the system and the size of the memory. While system lifetimes are not increasing, the size of memory is increasing. Accordingly, more system failures are expected.
In addition, some types of memory cells have higher failure rates than others. For example, EEPROM and flash memories can only be written a relatively small number of times compared to conventional DRAM and static RAM memories. In the case of EEPROMs and flash memories, the limited number of write cycles imposes severe restrictions on the possible applications of these memories. Similarly, memories based on ferroelectrics have relatively small lifetimes relative to these conventional memories; however, the ferroelectric memories can be written many more times than EEPROMs and flash memories.
In principle, all of these types of memories would benefit by having some form of reconfiguration system built directly into the memory. Such a system would replace blocks of memory that fail during the operational life of the system, thereby extending the lifetime of the system. However, prior to a memory cell actually failing, there is often a period of time in which the memory cell operates, but with a high error rate. Such a memory cell can cause intermittent system failures and may be very difficult to diagnose. Hence, any form of block replacement system that depends on detecting the failure of a block may not be able to operate successfully.
Broadly, it is the object of the present invention to provide an improved memory system.
It is a further object of the present invention to provide a memory system that can be reconfigured after the parts have been packaged.
It is a still further object of the present invention to provide a memory system that can detect memory cells with high error rates and replace these cells prior to the error rates causing system failures.
These and other objects of the present invention will become apparent to those skilled in the art from the following detailed description of the invention and the accompanying drawings.
The present invention is a reconfigurable memory having M bit lines and a plurality of row lines, where M greater than 1. The memory includes an array of memory storage cells, each memory storage cell storing a data value. The data value is read from or into the storage cells by coupling that data value to one of the bit lines in response to a row control signal on one of the row lines. A row select circuit generates the row control signal on one of the row lines in response to a row address being coupled to the row select circuit. The row select circuit includes a memory for storing a mapping of the row addresses to the row lines that determines which of the row lines is selected for each possible value of the row address. The memory includes a plurality of sense amplifiers, one such sense amplifier being connected to each of the bit lines for measuring a signal value on that bit line. The memory includes a controller that tests the memory storage cells and eliminates references to rows having defective storage cells from the row mapping. When the memory is powered up, the controller tests the memory cells and assigns row addresses in a manner that eliminates references in the row mapping to rows having defective storage cells. In one embodiment of the invention, the memory also includes a single cell memory for storing a plurality of single data values, each data value corresponding to one of the row addresses and one of the bit lines. An insertion circuit causes that data value stored in the single cell memory for one of the row addresses and bit lines to replace that value stored in the memory storage cell coupled to that bit line when that row address is coupled to the row select circuit. The memory also includes a word assembly circuit for selecting N bit lines from said M bit lines, where N is less than or equal to M. The word assembly circuit includes a memory for storing a mapping specifying the N bit lines for each possible row address. In such embodiments, the controller alters the mapping to eliminate a reference in the mapping to a bit line that causes a defective storage cell to couple data to a bit line in response to one of the row addresses. The memory also includes an error correcting circuit for detecting errors in data words. The error correcting circuit generates a corrected data word and an error data word from the N data values coupled thereto, the error data word indicating which of the N data values, if any, was erroneous. The N data values are determined by a word assembly circuit that connects N of the M bit lines to the error correcting circuit, where N is less than or equal to M. A control circuit connected to the error correcting circuit uses the error data words and the row addresses to alter the mapping in the row select circuit in response to the error data words so as to avoid defective memory storage cells or bit lines.