Microprocessors typically have several small critical structures. The critical structures can, for example, include instruction caches, data caches, and translation look-aside buffers (TLB). Typically, these structures are organized as set-associative structures with several levels of hierarchy for each structure.
The miss of an entry from a structure that is lower in the hierarchy, e.g., a Level 1 (“L1”) cache causes the entry to then be accessed from a higher level structure, e.g., a Level 2 (“L2”) cache and installed into the lower level. This installation into the lower level structure is required because typically such entries will be accessed repeatedly and the access times from lower level structures is much faster than from the higher level structures. Stated differently, there is a significantly higher penalty to access larger structures, e.g., a L2 cache over small structures, e.g., a L1 cache. Certain accesses, however, do not have much temporal locality. An instruction on a mispredicted path of a branch is a good example. An instruction on a mispredicted path may be accessed only a few times, but is unlikely to be subsequently accessed.
Installing entries from a mispredicted path into a lower level structure, e.g., L1 cache, is problematic because the lower level structures are typically very small due to area constraints and installation of entries from a mispredicted path and aggressive prefetching techniques may cause useful entries to be evicted from the lower level arrays in favor of less useful entries. Further, if a latency of accessing the next-level structure, e.g., an L2 cache is relatively high, evicting useful entries can have a significant penalty.
Conventional processors have dealt with the problem of minimizing the penalty of accessing higher level structures in various ways. Victim caches are one example of how conventional processors have tried to reduce the penalty of accessing higher level structures. FIG. 1 illustrates a victim cache scheme implemented by a conventional processor. In the victim cache scheme, incoming blocks from memory 158 (or L2 cache 156 if present) are always loaded into the L1 cache 102 with one of the cache blocks in L1 102 being replaced and moved to the victim cache 104. The victim cache 104 in turn discards one of its blocks and moves it back to memory 158 (or L2 cache 156 if present). The net effect is that when a new block is brought into the L1 cache 102, it is a victim cache block that is replaced with the discarded block being returned to main memory 158 (or the L2 cache 156).
As is well known, the L1 cache 102, often called the primary cache, is a static memory integrated with processor core 120 that is used to store information recently accessed by the processor 120. The purpose of the L1 cache 102 is to improve data access speed in cases where the CPU accesses the same data multiple times. The access time of the L1 cache 120 is always faster than the access time of system memory 158 or L2 cache 156. For this reason, it is important to make that critical data is present in L1 cache most of the time.
Conventional schemes such as implementing a victim cache, however, do not address the problem of temporal locality. In other words, conventional processors do not address the problem of prioritizing more frequently accessed entries over less frequently accessed or unnecessary entries. For example, in FIG. 1, the victim cache scheme illustrated does not have any circuitry or logic configured to filter out the less useful or less frequently accessed entries such that only the entries with the highest temporal locality are retained within the L1 cache 102 and victim cache 104.