A memory cache or "cache" is a mechanism in the memory hierarchy between the main memory and the CPU which improves the effective memory transfer rates and raises processing speeds. The term "cache" refers to the fact that the cache mechanism is not apparent to the user, who only observes an apparently higher speed main memory.
A cache has a smaller memory storage capacity than the main memory but has a much higher access speed. Caches are generally implemented by semiconductor devices, the speeds of which are comparable to that of the processor. By contrast, the main memory generally uses a less costly, lower speed technology, but has a much higher overall storage capacity.
The cache mechanism anticipates the likely re-use by the CPU of information, whether data or code, in the main memory by organizing a copy of the data or code in cache memory. When information is accessed from the main memory, it is common for associated information also to be accessed and stored in the cache. For example, if a required code is part of a sequence of instructions, the subsequent instructions should be received with the first instruction, so that access to the main memory can be minimized.
In modern microprocessors, it is common for one or more levels of memory caches to be included on a particular chip, and for an additional level of memory cache to be off-chip. Currently, some chips contain one or two on-chip caches, and it is anticipated that future products may contain more than two on-chip and off-chip caches.
In current microprocessors which access more than one level of cache, the higher-level caches generally have a larger storage capacity than the lower-level caches. For example, in the case of a two-level cache hierarchy, the second or L2 cache generally has a larger storage capacity than the first or L1 cache. The L2 cache has a significantly greater speed than the main memory, but it also has a significantly smaller storage capacity. The information on the L1 cache is usually a subset of the information on the L2 cache. The L2 cache only needs to be accessed if the desired code or data is not resident in the L1 cache.
For chips with multiple levels of caches, there is a tradeoff between the maximum processing speed and the minimum power usage of the caches. Processing speed can be maximized by simultaneously addressing more than one cache, for example, by simultaneously addressing the L1 cache and the L2 cache. However, simultaneously addressing the L1 and L2 caches uses power unnecessarily if the desired data or code is resident in the L1 cache.
Alternatively, power consumption can be reduced by accessing the higher-level cache or caches only when necessary. In the example of a chip with a 2-level cache hierarchy, power consumption can be reduced by accessing the L2 cache only when there is a "miss," that is, when the L1 cache is addressed but the desired data or code is not currently in the L1 cache (as opposed to a "hit" where the desired data or code is currently resident). However, this method results in a larger effective access time for the L2 cache and a consequent reduction in processing speed.
A prediction of L1 cache data read misses has been implemented in the Compaq Alpha 21264. See R. E. Kessler, E. J. McLellan, and D. A. Webb, "The Alpha 21264 Microprocessor Architecture," International Conference on Computer Design (ICCD'98), pp. 90-95 (October 1998) ("Kessler"), which is incorporated herein by reference.
However, the predictor described in Kessler does not disable the L2 cache for power savings. Instead, the Kessler predictor is used to reduce the penalty in those cases where a read access results in a "miss" of the L1 cache and the consumer of the read instruction was dispatched for execution before knowing if the data access resulted in a hit, in order to keep the L1 cache latency low. Therefore, if there is a miss, the consumer instruction and all the subsequent instructions have to be re-fetched with a high penalty in cycles.
To mitigate this problem, the Alpha 21264 has a data read miss predictor which consists of a saturated 4-bit counter that tracks the hit/miss behavior of recent data reads. This counter decrements by two on cycles when there is a read miss and increments by one when there is a hit. The most-significant bit of the counter is used to do the prediction.
It is an object of the present invention to provide a method for reducing power consumption in a multiple-cache microprocessor without creating an unacceptable reduction in processing speed.