The performance gap between the processor and the memory system in computing systems has been steadily increasing. With every generation, processors are being clocked at higher rates, while memory systems have been unable to catch up to this exponential growth. The resultant performance gap has caused a major bottleneck for single-thread performance in modern day processors, as the memory system is generally unable to supply data to the processor core at the rate of execution of the processor.
Traditional means of reducing this performance gap include the use of cache memory of varying sizes and levels, which provides temporary storage for and quick access to frequently used data. Cache memory is conventionally managed by hardware, which attempts to cache the most frequently accessed data while purging older, unused data and fetching data from nearby memory locations, thus retaining the working set of the program in cache.
Although there are many advantages to using hardware-managed cache systems, such as transparency and flexibility, the layer of abstraction afforded by these systems comes at a price. For one, because hardware-managed cache is designed to support a wide spectrum of workloads, the design tends to be very generic in terms of the algorithms used for cache placement (i.e., determining where in the cache array to keep a data item fetched from a specific memory address). For example, set placement decisions are made based on selected bits from the physical address to index into a given set of a set-associative cache. Although such greedy/random replacement decisions provide reasonably high hit-rates at low cost, they have proven to be ineffective for high memory bound workloads. Thus hardware-managed cache is generally less capable of exploiting application-specific behaviors and replacement policies than other memory schemes. Other software-based approaches that rely on the programmer or compiler for data movement also tend to behave too conservatively to close the processor/memory system gap effectively.
Thus, there is a need in the art for a method and apparatus for application-specific dynamic cache placement.