A computer stores data in memory. Data may be computer-executable instructions and control structures used to operate the computer or information of importance to a user of the computer. In order to do useful work, the computer operates on and performs manipulations against this data; for example, the computer might add two pieces of data together or compare two pieces of data to determine which is larger. Ideally, a computer would have a singular, indefinitely large and very fast memory, in which any particular data would be immediately available to the computer. In practice this has not been possible because memory that is very fast is also very expensive.
Thus, computers typically have a hierarchy (or levels) of memory, each level of which has greater capacity than the preceding level, but which is also slower with a less expensive per-unit cost. These levels of the hierarchy may form a subset of one another, that is, all data in one level may also be found in the level below, and all data in that lower level may be found in the one below it, and so on until we reach the bottom of the hierarchy. In order to minimize the performance penalty that the hierarchical memory structure introduces, the computer would like to store the most frequently-used data in the fastest memory and the least frequently-used data in the slowest memory.
For example, a computer might contain:
1) a cache that contains the most frequently-used data;
2) a RAM (Random Access Memory) that contains all the data in the cache plus the next-most frequently-used data; and
3) a disk drive that contains all the data in the computer.
In order to determine which data should be placed in the faster memory; for example in the cache or RAM, the computer may attempt to predict which data will be frequently used. In order to predict use frequency, computers have typically used the theory of “temporal locality of reference”: recently-used data is likely to be used again soon. Using this theory, when the computer needs a piece of data, it looks first in the cache. If the data is not in the cache, the computer then retrieves the data from a lower level of memory, such as RAM or a disk drive, and places the data in the cache. If the cache is already full of data, the computer must determine which data to remove from the cache in order to make room for the data currently needed. One removal method is for the computer to replace the data that has been unused for the longest time. This exploits a corollary of temporal locality: if recently-used data is likely to be used again, then the best candidate for removal is the least recently-used data. Thus, one method for replacing data in fast memory is the Least Recently Used (LRU) method.
The LRU method only yields good computer performance when the “temporal locality of reference” theory holds true; that is, in situations where the recently-used data is actually likely to be used again soon. If the temporal locality of reference theory does not hold true, then the LRU method by itself performs poorly. An example of when the LRU method may perform poorly is when multiple instruction streams (threads or processes) are all accessing the same cache. Multiple instruction streams can result from, e.g., a computer that has multiple processors, multiple cores within a processor, or multiple instruction streams executing concurrently on the same processor. These instruction streams may access completely different data, yet their cache accesses may be interspersed.
Thus, when multiple streams are accessing data, the temporal locality of reference theory does not necessarily hold true across multiple streams, and poor performance can result because the streams may interfere with each other's cache use, and the computer may discard the data from the cache that is actually likely to be used next.