A cache is a fast memory for storing copies of frequently accessed data. As processors become faster, cache access time is often a dominating factor in system performance. Conflicting goals face designers of cache memory systems. Smaller caches provide faster access times, but larger caches provide higher hit ratios, thereby reducing penalties associated with accessing slower memory.
Since a current trend in processor design is to devote a substantial proportion of chip area to cache memory, much effort has been invested in improving access times for large caches.
One prior art method shown in, for example, U.S. Pat. No. 5,532,947 combines an adder for generating an effective address with a word-line decoder. This combined decoder/adder is shown in FIG. 1. Another prior art method shown in, for example, U.S. Pat. No. 5,860,092 combines an adder with a pre-decoder circuit to provide an input to a word-line driver. FIG. 2 shows this prior art method. One disadvantage with these two methods is that carry propagation for larger addresses can adversely affect cache access time.
In another prior art method, Cortadella et al (in “Evaluation of A+B=K Conditions Without Carry Propagation,” IEEE Transactions on Computers, vol. 41, pp. 1484–1488, November, 1992) show that an equality test does not require carry propagation. One representation of a sum A+B, which is suitable for use in a carry nonpropagative equality test, is known as half-adder or carry-sum form. The carry-sum representation uses a carry bit, Ci, and a sum bit, Si, to represent a binary digit of a number in the ith digit position. In carry-sum form each number may have multiple valid representations. In a system of numbers, where each number is assigned multiple binary representations, the numbers are said to be in redundant form.
Current processors make use of pipelining to reduce cycle times and exploit parallelism within instruction streams. In order to make pipelining efficient, results from digital arithmetic circuitry are bypassed back to circuit inputs as operands for the next operation in a pipeline. This technique is preferred over one of waiting until results are written back to a register file, and it provides for higher utilization of a pipeline's parallelism. Since quickly loading operands from memory is critical to the performance of a processor, it may be desirable to bypass results of address computations to a load/store unit in order to reduce any delays associated with the load.
When bypassed addresses are used to access cache it is desirable to have a cache that can decode addresses in redundant form. When address calculations have time to complete, the addresses are already converted to a unsigned binary number and so a traditional decoder is desirable. Often, a design decision must be made to store all addresses to one or the other form because two decoders require too much in area resorces.