The present invention relates to cache memories and more specifically to a method and apparatus for allocating data and instructions within a shared cache.
Processor cache architecture schemes generally follow one of two models: a split cache model or a shared (unified) cache model. In a split cache model, two distinct first level caches are provided, a first cache for data and a second cache for instructions. The disadvantage of this architecture is that some applications are heavily weighted toward either data or instructions. In these situations, a split cache effectively excludes a large portion of the total cache capacity from use (e.g., either the data cache or the instruction cache, depending on the weighting of the application), and therefore makes highly inefficient use of cache resources.
In a shared cache both data and instructions inhabit a single cache, and the continued residency of data and instructions within the cache is managed by a single replacement algorithm. For example, a commonly employed replacement algorithm is a xe2x80x9cleast-recently-usedxe2x80x9d (LRU) algorithm that assigns an xe2x80x9cagexe2x80x9d to each line within the single data and instruction cache. As new data is loaded into a line of the cache, or as a new cache line is accessed, the cache line is assigned the youngest cache line age while all other lines within the cache are aged. When a cache line needs to be discarded, the cache line having the oldest cache line age associated therewith is replaced.
In practice, actual implementations of the LRU algorithm rely upon incomplete retained data on actual cache usage (e.g., there are simply too many lines in a typical cache to maintain a complete set of statistics on the use of each cache line and there is too little time available during cache operations to evaluate a complete set of cache line use statistics). Therefore, actual cache line replacements are made on a partially random basis.
For xe2x80x9cdistributed statisticsxe2x80x9d (wherein the shared cache contains a similar number of data and instruction cache lines with similar ages), the LRU algorithm functions well. However, for non-distributed statistics (wherein the shared cache contains a non-similar number of data and instruction cache lines having non-similar ages), the LRU algorithm often maintains a non-optimal balance between the number of data and instruction lines within a shared cache. Accordingly, a need exists for an improved method and apparatus for allocating data and instructions within a shared cache.
To overcome the needs of the prior art, an inventive method and apparatus are provided for managing cache allocation for a plurality of data types in a unified cache having dynamically allocable lines for first type data (e.g., data/instructions) and for second type data (e.g., instructions/data). Cache allocation is managed by counting misses to first type data and misses to second type data in the unified cache, and by determining when a difference between a number of first type data misses and a number of second type data misses crosses a preselected threshold. A replacement algorithm of the unified cache then is adjusted in response to the detected crossing of the preselected threshold, the adjusting step including increasing a replacement priority of the first type data lines in the cache. The replacement algorithm preferably is an LRU algorithm wherein the adjusting step includes incrementing an age indication of the first type data lines. To re-balance the count of misses to first type data and the count of misses to second type data (e.g., during a new task), preferably the count of misses to first type data and the count of misses to second type data are reset after a predetermined time period or in response to a new task.
Hardware for implementing the inventive cache allocation management method comprises a miss counter having a first counter input adapted to couple to the control logic of the unified cache and to receive a miss to first type data signal therefrom, a second counter input adapted to couple to the control logic of the unified cache and to receive a miss to second type data signal therefrom and a first counter output. The miss counter is configured to increment its count in response to a miss, to first type data signal on the first counter input and to output a first logic state on the first counter output when its count exceeds a first predetermined count. A priority adjustment circuit is coupled to the first counter output of the miss counter and is adapted to couple to the replacement algorithm logic of the unified cache. The priority adjustment circuit is configured to increase the replacement priority of the first type data relative to the replacement priority of the second type data in response to the first logic state output by the miss counter on the first counter output.
Preferably the miss counter is further adapted to decrement its count in response to a miss to second type data signal on the second counter input and to output a second logic state on the first counter output when its count is equal to or less than the first predetermined count. The priority adjustment circuit thereby may be configured to increase the replacement priority of the second type data relative to the replacement priority of the first type data in response to the second logic state output by the miss counter on the first counter output. The priority adjustment circuit preferably comprises an LRU priority adjustment circuit configured to inhibit aging of at least a portion of first/second type data within the unified cache by an LRU algorithm of the cache when the second/first logic state is output by the miss counter. Preferably the miss counter""s count is resettable and/or presettable, the response rate of the miss counter to misses to first type data and/or misses to second type data is adjustable, and an upper and a lower count threshold may be set to limit the count range of the miss counter.
By monitoring the ratio of misses to first type data to misses to second type data, and by adjusting the percentage of the unified cache dedicated to each type data based thereon, a unified cache""s hit rate is significantly improved. Further, cache hit rate improvement is achieved with a minimal increase in cache circuitry complexity.
Other objects, features and advantages of the present invention will become more fully apparent from the following detailed description of the preferred embodiments, the appended claims and the accompanying drawings.