Within the art of computer processing and memory systems in particular, the speed gap between processor and main memory has grown. This gap directly impacts the performance of overall computing systems. To alleviate performance concerns, computing systems include a cache mechanism upon which they depend to bridge this speed gap. The success of such cache mechanisms in bridging this speed gap varies with parameters such as size, block size, and associativity. However, such cache mechanisms cannot continuously improve performance by changing these parameters because in doing so there is reached a point of diminishing returns due increasing system complexity, power consumption and the behavior of the cache itself.
Existing caching mechanisms generally depend upon spatial and temporal localities of references in order for the caching to be effective. However, some situations such as multimedia applications have limited localities and are more dependent on the performance of main memory. Also, applications that are written in C, C++, and Object Oriented Programming use dynamically allocated data structures to map data to available memory space. As such data could be scattered in memory and therefore lack spatial locality. Data elements in these applications might not be reused soon enough in time and thus also lack temporal locality. The lack of spatial and temporal locality in these types of computing applications makes conventional cache mechanisms less effective.
Within memory systems, a reference that lacks spatial and temporal locality but is used many times is more important to keep in cache than a reference that is captured in cache but never used again. Cached references that are not used again are undesirable because they compete with the other more frequently used references. As well, references that have poor spatial and temporal locality but are rich in their frequency of use must not be removed or replaced.
Improvements to such cache mechanisms have generally been achieved by utilizing the locality of references that might not exist in some applications. Most often, this is accomplished with the added cost of extra hardware to keep more references in cache by increasing the size or associativity of cache, providing extra levels of memory hierarchy, or providing pre-fetching buffers.
Accordingly, cache mechanisms have become less efficient due to the growing gap between processor speed and main memory speed and the lack of localities in some applications. To deal with cache limitations and the continuous increase in cache miss penalty, hardware and software pre-fetching schemes have been used to fetch the data from the main memory in advance, before the data is needed. While such pre-fetching methods are useful for regular access patterns found in some applications, such methods cannot hide very long cache miss latencies which may extend to hundreds of cycles in modern and future processors.
Other methods have been proposed to better manage primary memory cache (i.e., level 1 or L1 cache) through selective allocation. These selective allocation schemes statically partition the cache so that cache blocks are allocated in different sub-caches based upon their spatial and temporal localities. However, these methods are problematic in that they may perform poorly if the access patterns do not suit the partitioning.
Other methods have been proposed which employ the frequency of use of references in a multi-level cache so as to move the most frequently used references to a higher level cache in order to improve L1 cache performance. However, this approach is suitable for systems with a small L1 cache, becoming less effective when L1 cache size and associativity increases.
Still other methods have been proposed using the first level cache as a filter so as to bring useful speculative memory references to first level cache. However, this approach yields limited performance improvements and disadvantageously increases hardware cost in order to predict the usefulness of references.
It is, therefore, desirable to provide a cache mechanism that bridges the speed gap between modern memory systems and computer processors in a cost-effective manner that does not rely upon costly additional hardware and can handle large memory caches.