A processing system may comprise one or more processors which can make requests for accessing data stored in a memory (e.g., a main memory or hard disk implemented in a double data rate (DDR) implementation using dynamic random access memory (DRAM) technology). Memory requests generated by a processor may display temporal locality, which means that the requests are directed to data which was recently requested, and correspondingly also means that the same data may be requested again in the near future. To exploit temporal locality, one or more caches may be provided to store data which is determined to have likelihood of future use. The caches may be designed to be small in size to enable high speeds (e.g., in the order of few tens of clock cycles, as compared to memory access speeds which can be in the order of hundreds or thousands of clock cycles).
If the requested data is present in the cache, a cache hit results and the data can be read directly from the cache which produced the cache hit. On the other hand, if the requested data is not present in the cache, a cache miss results, and backing storage locations such as other caches or ultimately the memory may be accessed to retrieve the requested data. Since the caches are designed to be small, the limited storage space in the caches may be filled up, which means that some cache lines may need to be evicted (called victim cache lines) to accommodate incoming cache lines (called contender cache lines). Cache replacement policies are known in the art for evicting the victim cache lines and replacing them with the contender cache lines. The process of selecting which cache lines to evict is referred to as victim selection.
Some cache replacement policies such as least recently used (LRU) replacement policies rely on the temporal locality of the data requested, and may evict cache lines which were not accessed for the longest period of time. An objective of such cache replacement policies is to maximize cache hits, or put another way, to minimize cache misses. While LRU may be an effective replacement policy for applications whose requests have high temporal locality, the performance of LRU based replacement policies may deteriorate if future accesses (also referred to as re-reference or reuse) of stored data in a cache do not occur soon enough.
To explain, some applications or workloads may generate a set of requests for a number of cache lines which is greater than the capacity of the cache. In such cases, the cache may be constantly updated to capacity with a subset of the set of cache lines required by the application, while remaining cache lines for the application may be evicted. This leads to a situation known as “cache thrashing,” wherein a future request for a cache line of the application may be received by the cache after that cache line has already been evicted based on an LRU replacement policy. Thus, the request for the evicted cache line would result in a miss, whereas if the cache line had not been evicted, a cache hit would have occurred for the request. Cache thrashing in this manner can lead to poor performance since cache requests by the application can result in frequent misses in this manner.
In another example, some workloads which include intermittent data requests with no temporal locality, also referred to as scans. In such cases, reuse of data stored in the cache may be far apart in time, which means that LRU based replacement policies may evict some data for which references are seen to be repetitive, but may not be fast enough to avoid eviction. Thus, even in the case of scans, cache misses may increase and performance may suffer.
Accordingly, there is a recognized need for cache replacement policies which are protected from the negative effects of thrashing and for workloads involving scans. Some approaches in this regard involve a dynamic re-reference interval prediction (DRRIP) where future re-reference intervals for cache accesses are dynamically predicted (see, e.g., Jaleel et al., “High Performance Cache Replacement Using Re-Reference Interval Prediction (RRIP),” ISCA '10, Jun. 19-23, 2010, Saint-Malo, France, hereinafter, “Jaleel”). In DRRIP, e.g., as discussed in Jaleel, a victim selection scheme is used where a cache line predicted to be re-referenced furthest in the future is selected to be evicted or replaced. The future re-reference interval is continually updated, in at least the following two instances, involving cache hits and cache misses. When there is a cache hit for a cache line present in the cache, a hit update policy is used to update the future re-reference interval of the cache line. When a cache miss are observed, an insertion policy is used to assign the future re-reference interval of the cache line inserted in the cache pursuant to the cache miss. In DRRIP, effects of scan and thrash are sought to be avoided by dynamically dueling between two policies: a static re-reference interval prediction (SRRIP) and a bi-modal re-reference interval prediction (BRRIP).
An objective of the SRRIP policy is to make cache lines scan resistant, by seeking to ensure that newly allocated cache lines are not stored in a cache for either too much or too little time. As such, an SRRIP uses an insertion policy wherein the newly allocated blocks are inserted with a future re-reference interval which falls in the middle of a re-reference interval prediction value (RRPV) chain which ranges from the shortest future re-reference interval at the beginning or head of the RRPV chain to the furthest future re-reference interval at the end or tail of the RRPV chain (keeping in mind that cache lines with the furthest future re-reference interval, i.e., at the tail of the RRPV chain, are chosen for eviction).
An objective of the BRRIP policy is to make cache lines resistant to effects of cache thrashing (e.g., preserving a portion of cache lines associated with a workload and likely to have future re-reference in the cache). Accordingly, a BRRIP uses an insertion policy wherein new cache lines are inserted with a future re-reference interval which falls towards the tail of the RRPV chain (i.e., most likely to be evicted). More specifically, some BRRIP insertion policies seek to insert new cache lines at the tail of the RRPV chain with a high probability and in the middle of RRPV chain with a smaller probability.
The DRRIP policy dynamically duels between SRRIP and BRRIP by assigning each one of the SRRIP and BRRIP policies to a selected small number of sets (referred to as leader sets) of cache lines in the cache. The remaining sets of cache lines, called follower sets, follow the policy which performs better among the two leader sets, i.e., the better performing policy among SRRIP and BRRIP.
A drawback of the above-mentioned known replacement policies (e.g., LRU, SRRIP, BRRIP, DRRIP) is that they fail to distinguish between the different penalties that may be incurred for different cache misses. In other words, conventional implementations of these replacement policies have an underlying assumption that all cache misses will incur the same performance penalty. However, it is seen that different cache misses can have different performance impacts. For example, a cache miss in an L1 cache that hits in a backing cache such as an L2 cache (e.g., implemented as a random access memory, “RAM”) can be serviced within a few hundred cycles, while servicing a cache miss in an L1 cache that also misses in the L2 cache for which data will have to be retrieved from the main memory or hard drive (DRAM) may involve thousands of cycles. In this disclosure, the performance penalty is also referred to as the cost of a miss (or simply, “cost”).
Therefore, there is a need in the art for cost-aware replacement policies, i.e., cache replacement policies which also take into account the different performance penalties for different cache misses.