Embodiments of the present invention relate to processors and more particularly to processors of a system having a multiple-level cache hierarchy.
Many systems include one or more cache memories to temporarily store data in closer relation to a processor in which the data will be used. In this way, decreased data retrieval times can be realized by the processor, improving performance. Multiple levels of cache memory may be present in certain systems. These cache levels may include a so-called level zero (L0) cache memory that can be present within a processor, as well as a so-called level one (L1) cache memory that also can be present within the processor. Additional levels of cache memories, either within the processor or closely coupled thereto, may further be present in various systems.
In some systems, multiple levels of cache memory may be implemented as an inclusive cache hierarchy. In an inclusive cache hierarchy, one of the cache memories (i.e., a lower-level cache memory) includes a subset of data contained in another cache memory (i.e., an upper-level cache memory). Cache hierarchies may improve processor performance, as they allow a smaller cache having a relatively fast access speed to contain frequently used data. In turn, a larger cache having a slower access speed than the smaller cache stores less-frequently used data (as well as copies of the data in the lower-level cache). Typically, the lower-level cache memories of such an inclusive cache hierarchy are smaller than the higher-level cache memories.
Because inclusive cache hierarchies store some common data, eviction of a cache line in one cache level may cause a corresponding cache line eviction in another level of the cache hierarchy to maintain cache coherency. More specifically, an eviction in a higher-level cache causes an eviction in a lower-level cache. Various eviction schemes can be used in different cache memories. One common eviction scheme is known as a least recently used (LRU) scheme in which a least recently used cache line is selected for eviction. Accordingly, each cache line may have recency information associated with it to indicate its age with respect to other cache lines in the cache. Additional caching techniques include associating state data with cache lines to indicate accessibility and/or validity of cache lines. For example, state data may include the following states: modified (M), exclusive (E), shared (S), and/or invalid (I), otherwise known as so-called MESI states.
Using conventional eviction techniques, cache lines in a higher-level cache may be evicted as being stale (i.e., a least recently used cache line) although a corresponding copy of that cache line in a lower-level cache may be heavily accessed by a processor. In hierarchies having inclusivity, when a higher-level cache line is evicted, a corresponding cache line in a lower-level cache must also be explicitly invalidated. Such lower-level invalidated cache lines may include data that is frequently accessed by the processor, causing unnecessary cache misses. These cache misses require significant latencies to obtain valid data from other memory locations, such as a main memory.
Furthermore, problems occur when an inclusive cache hierarchy has a higher-level cache that is shared among multiple processors, for example, multiple cores of a multi-core processor. In this scenario, each core occupies at least some cache lines in the higher-level cache, but all cores contend for the shared resource. When one of the cores uses a small working set which fits inside its lower-level cache, this core rarely (if ever) would have to send requests to the higher-level cache since the requests hit in its lower-level cache. As a result, this core's lines in the higher-level cache become stale regardless of how often the core uses them. When sharing the higher-level cache with other cores that continually allocate cache lines into the higher-level cache, this core's data is evicted, causing performance degradation.