The increasing demand for memory bandwidth in recent high-performance VLSI processors is not satisfied by existing memory technology. In “Fault-Tolerant Interleaved Memory Systems with Two-Level Redundancy”, by Lu et al in IEEE Transactions on Computers, Vol. 46, No. 9, September 1997, several memory banks or modules of a main memory are accessed by processors in an interleaved fashion, in order to achieve a memory with high bandwidth. However, if a plurality of memory modules are used, some of these modules might be faulty. To cope with these faulty modules Lu et al proposes to use a memory containing a plurality of modules as well as to provide some spare modules in said memory. These spare modules may belong to the same bank or may constitute global spare modules. If a faulty module occurs, the memory management may initiate a replacement of this faulty module by one of the spare modules. A module map table is provided for selecting a spare module to replace a faulty one and a bank map table is provided for selecting a spare bank to replace a faulty bank. Nonetheless, the teachings of Lu et al relate to the communication between processors and main memory having redundant memory modules without any caches between the processors and the main memory, which does not appear to be beneficial in terms of latency and bandwidth.
In contrast to the above, the extensive use of on-chip cache memories have become essential to sustain the memory bandwidth demand of the CPU. The advances in semiconductor technology and continuous down scaling of feature size creates extra-space for additional functionality on single chip. The most popular way to make use of this extra space is integrating a cache of bigger size so that a microprocessor is able to gain higher performance. However, an increase in the circuit density is closely coupled with an increase in probability of defects. Caches constitute a redundant structure which is employed to enhance the performance of the CPU. One method to tolerate faults in the cache is providing spare cache blocks. The defective block is switched to a spare block by a reconfiguration mechanism, or by providing small fully associative cache to dynamically replace the faulty block.
However, since the provision of caches with spare or redundant memory modules is expensive, new techniques being able to sooth the degradation of cache performance without spare cache blocks are needed. Therefore, instead of using explicit spare blocks, the physical or logical neighborhood blocks play the role of spare block. Dong-Hyun et al, “Re-evaluation and Comparison of Fault Tolerant Cache Schemes”, University of Wisconsin Madison ECE Dept. 753 Course Project, 2002, as well as Shirvani et al, “PADded Cache: A New Fault-Tolerance Technique for Cache Memories”, Proc. 17.sup.th IEEE VLSI Test Symposium, 1999 describe a Programmable Address Decoder PAD for a cache. A PAD is a decoder which has programmable mapping function. As mentioned before, caches have an intrinsic redundancy since the purpose of caches is to improve performance. Many processing architectures can work without any cache but at the cost of degraded performance. Therefore, introducing additional redundancy, like spare memory blocks, is inefficient.
During operation usually not all of the sets in a cache are used at the same time because of the spatial and temporal locality of memory references. Accordingly, there must be some—currently unused—cache sets which can substitute the spare blocks. When a memory reference occurs, a decoder maps this to the appropriate block. Once a faulty block is identified, a PAD automatically redirects access to that block to a healthy block in the same primary cache. If a cache with a PAD has n cache blocks and one block is faulty, the cache will work as if it has n-1 cache blocks. The PAD re-configures the mapping function so that a ‘healthy’ block acts as a spare block. The method to find suitable defect-free block is predefined and implemented in hardware.
There are three different ways that the mapping can be performed. In a Direct Mapped Cache, which is the simplest way to allocate the cache to the system memory, it is determined how many cache lines are present and the system memory is divided into the same number of portions. Then each portion is used by one cache line. The Fully Associative Cache makes it possible to design a cache, such that any line can store the contents of any memory location, instead of hard-allocating cache lines to particular memory locations. The third cache mapping scheme is the N-Way Set Associative Cache. This scheme constitutes a compromise between the direct mapped and fully associative designs. The cache is divided into sets, where each set contains N cache lines, i.e. N ways. Then, each memory address is assigned to a set, and can be cached in any one of those N locations within the set that it is assigned to. In other words, within each set the cache is associative. Accordingly, there are “N” possible places that a given memory location may be in the cache. The mapping is usually integrated in a tag RAM address decoder, which constitutes the area in an L2 cache that identifies which data from main memory is currently stored in each cache line. The values stored in the tag RAM determine whether a cache lookup results in a hit or a miss.
For instance, each way of a 4-way associative cache can have separate PADS. Therefore, cache addresses for faulty blocks are remapped to correct blocks within said way. All addresses are still cacheable, but conflict misses are increased. For direct-mapped caches, since at least one address bit information is lost as a consequence of remapping, at least one bit is augmented to the tag bits in order to be able to distinguish those addresses that may be mapped to the same block. The cache remapping is performed on a per-block basis, wherein a faulty block is mapped to a “healthy” one, which address differs from the address of the faulty block by merely one bit. Usually, for set-associative caches a separate memory array is provided for each way, so that a decoder can be associated to each array. Accordingly, the remapping is performed merely in one array or way and will not affect the mapping of the other arrays.