Modern memory subsystems, such as those found in desktop, laptop and tablet computers, and even smartphones, employ a stratified memory architecture that divides memory into levels of different speeds and sizes. Stratified memory architectures are based on the fact that faster memory is more expensive than slower memory. Thus, a typical memory subsystem may have a very fast and small Level 1 cache, a larger but still fast Level 2 cache, an even larger but slower Level 3 cache and a far larger but far slower main memory. A central processing unit (CPU) or graphics processing unit (GPU) requests data from the memory subsystem as a whole. The memory subsystem is responsible for copying lines of data from the main memory to the Level 3 cache, to the Level 2 cache and to the level 1 cache as needed, with the goal of minimizing memory latency (most often expressed in clock cycles), or, stated another way, maximizing the hit rates of the Level 1, Level 2 and Level 3 caches.
Unfortunately, cache misses are essentially unavoidable. Cache misses also introduce substantial latency, because replacing a line of data in a cache involves not only making room for it in the cache to which it is being added, but also ensuring that it is able to be read from the next lower memory. In the worst case, data has to be copied from the lowest memory level all the way up to the Level 1 cache, and this can take hundreds if not thousands of clock cycles.
Fortunately, predictive replacement policy algorithms have been developed to increase cache hit rates, with the goal of replacing lines before they are requested. Furthermore, some cache memories are provided with input buffers, guaranteeing that they have room to receive a replacement line from the lower level.