A modern computer system typically has one or more processors or central processing units (CPUs) at the heart of the system. These processors execute instructions on data to perform requested operations. Processors operate at extremely high frequencies. To have data readily accessible to the processors, the data can be stored in a cache memory. Different implementations of cache memories exist. Oftentimes, a small cache memory may be located on the same semiconductor die as the processor, providing a close and fast source of data. Some memory architectures can have multiple levels of a memory hierarchy, with each higher level further away from the processor, until reaching a system memory and/or mass storage device.
While these higher levels of a memory hierarchy can store large amounts of data, the access times are vastly slower than the access times for a lower level cache memory. Accordingly, a large latency is incurred when needed data is available at these higher levels. Thus, recently and/or frequently accessed data may be stored in a lower level of a memory hierarchy.
Cache memories are typically implemented using a given replacement scheme. Many replacement schemes are according to a least recently used (LRU) policy in which a least recently used cache line can be selected as a victim cache line to be replaced with new data to be inserted into the cache. As larger processors including more cores on a single die and different cache architectures including shared cache architectures become available, a LRU replacement scheme may not accurately reflect the true value of the data, and thus it is possible for needed data to be unavailable, causing a long latency to obtain the data.