1. Technical Field of the Invention
The present invention relates to computer caches and more particularly to a computer system in which tag information is distributed between a processor and cache memory.
2. Background Art
Caches are used to bridge the gap between fast processor cycle time and slow memory access time. Given a fixed size, the performance of an L2 cache is mainly determined by three factors: latency, set-associativity, and bandwidth (or burst rate). Of the three, bandwidth plays a particularly important role. As used herein, an L2 cache is a cache that is off the processor. Unfortunately, the best known low-cost L2 cache methods fail to adequately address the bandwidth problem. In an L2 cache having only two 32-bit wide burst static random access memory (BSRAM or burst SRAM) components, the L2 caches require four data bus bursts to provide a 32-byte cache line to the processor. When the processor operates at twice the speed of the L2 cache data bus, this translates into eight processor cycles during which no other request can use the L2 data bus, resulting in suboptimal performance. This bandwidth situation will get worse as processor speed increases at a faster rate than the inter-chip I/O speed. The bandwidth problem can be solved in part by doubling the data bus width. However, this requires more pins and therefore is not a low-cost solution.
Another potential roadblock toward a successful low-cost L2 cache solution is rampability (i.e., the ability to produce a product in sufficient quantity at a reasonably price). For example, an L2 cache of the Pentium.RTM. Pro processor manufactured by Intel Corporation involves two types of components: tag random access memory (RAM) and BSRAM components. Tag RAM is a cache directory memory that contains a listing of all the memory addresses that have copies stored in cache memory. Each cache location has a corresponding entry in the cache directory. The contents of the directory are compared to the memory address from the processor to determine if a copy of the requested data is contained in cache memory, saving accessing the data from slower main memory. For example, referring to FIG. 1, a cache memory array 10 includes two 32-bit commodity BSRAM components 14 and 16 and a tag RAM component 18.
Commodity BSRAM based L2 solutions fall into two categories: serial tag RAM solutions and parallel tag RAM solutions. Serial tag RAM solutions provide a large set-associativity, while parallel tag RAM solutions provide shorter latency. Both approaches suffer from a common performance bottleneck: an insufficient amount of bandwidth, particularly when the L2 cache bus cannot be operated at the full processor speed.
In order to ramp a microprocessor such as the Intel Pentium.RTM. Pro processor, both tag RAM and BSRAM components need to be available in large volume at the same time and at commodity prices. The memory industry needs to design two parts, test two parts, and manage production for two different parts. Original equipment manufacturers (OEMs) need to qualify two parts and manage the purchase and inventory of the two parts. Any surprise in volume, timing, or quality of either of the two components would pose a significant risk in the ability to ramp processors.
Accordingly, there is a need for cache memory that addresses both bandwidth and rampability problems.