Microprocessors with one or more large on-die cache memories are known. Such cache memory is used to expedite operation of the system by reducing the number of fetches from main memory. (Such fetches have large latencies because main memory is located off the chip.) Cache memory is arranged in arrays, with each array having a number of array lines. A cache line is an addressable line of memory written across a bank of arrays. For example, a bank of thirty-two arrays each having an array line size of 4 bytes, could be used to form a set of 128 byte cache lines. For an array size of 32 KB, the set of arrays would provide 8196 cache lines of 128 bytes.
Typically, such cache memories include a data array, a cache directory, and cache management logic. The cache directory usually includes a tag array, tag status bits, and least recently used (LRU) bits. (Each directory entry is called a “tag.”) The tag directory contains the main memory addresses of code and data stored in the data cache plus additional status bits used by the cache management logic.
While the presence of these large on-die caches has improved system performance, integrating such large caches in an acceptable die area has required a drastic reduction in memory cell size. This reduction in cell size, the lower voltages required by these small cells, and process variations during manufacturing have significantly impacted memory cell stability which translates directly into loss of production yield (i.e., it increases the number of rejected chips in the manufacturing process).
As used herein, a hard error is an error that is always present, usually due to a defect in the physical structure of a memory cell. A soft error is an error that only occurs once during an access to memory. Subsequent accesses to that memory location do not usually repeat. Instead, such subsequent accesses result in normal operation.
Error correction coding (ECC) techniques are known which can identify and fix some hard errors to thereby improve the production yields. However, using ECC techniques and the available ECC bits to correct hard errors, reduces the number of soft errors (e.g., particle induced changes in bits) that can be corrected with the practically available ECC techniques and, thus, results in an increased soft error rate (SER).
To address this issue, hardware redundancy is currently being used. In this technique, one extra redundant array is provided for each set of memory arrays defining a set of cache lines. (For example, in the 32 array illustration given above, one extra array is provided for each bank of 32 arrays.) Due to the presence of this redundant array, if an array in the associated bank of arrays is defective due to a failure of one or more bits in the array, the defective array is replaced with the redundant array on a one array for one array basis.
If the number of defective arrays exceeds the number of associated redundant arrays (e.g., if more than one array in a bank is defective), the chip is non-functional and must be discarded. If there is a large bit failure rate, to compensate for lost arrays, a large degree of redundancy is required. However, increasing the number of redundant arrays, increases the die size and is, thus, not desirable.
Very large on-die caches also present further difficulties in the implementation of redundant storage elements. In traditional cache designs with redundancy, the redundant array is read at the same time that all the other arrays are read. The selection of which bits are output from the cache is typically controlled through multiplexing. When an array fails, fuses on the chip are usually blown in order to switch the defective array out and replace them with the redundant array. The drawback of this approach is that, if the cache has very large outputs, the multiplexing problem is huge. For example, if the cache outputs 256 bits, then the redundant array has to have multiplexing connections to be able to feed the data to any one of those 256 bits. Naturally, a huge overhead problem is created by such connections.