Generally computers and the programs executed by them have a voracious appetite for unlimited amounts of fast memory. Unfortunately, memory (especially fast memory) is generally expensive, both in terms of cost and die area. The traditional solution to the desire for unlimited, fast memory is a memory hierarchy or system of tiers or levels of memories. In general, the tiered memory system includes a plurality of levels of memories, each level slower but larger than the previous tier.
A typical computer memory hierarchy may include three levels. The fastest and smallest memory (often called a “Level 1 (L1) cache”) is closest to the processor and includes static random access memory (SRAM). The next tier or level is often called a Level 2 (L2) cache, and is larger but slower than the L1 cache. The third level is the main memory and generally includes dynamic RAM (DRAM), often inserted into memory modules. However, other systems may have more or less memory tiers. Also, in some systems the processor registers and the permanent or semi-permanent storage devices (e.g., hard drives, solid state drives, etc.) may be considered part of the memory system.
The memory system generally makes use of a principle of inclusiveness, wherein the slowest but largest tier (e.g., main memory, etc.) includes all of the data available. The second tier (e.g., the L2 cache, etc.) includes a sub-set of that data, and the next tier from that (e.g., the L1 cache, etc.) includes a second sub-set of the second tier's subset of data, and so on. As such, all data included in a faster tier is also included by slower tier.
Generally, the caches decide what sub-set of data to include based upon the principle of locality (e.g., temporal locality, spatial locality, etc.). It is assumed that a program will wish to access data that it has either recently accessed or is next to the data it has recently accessed. For example, if a movie player program is accessing data, it is likely that the movie player will want to access the next few seconds of the movie, and so on.
However, occasionally a program will request a piece of data that is not available in the fastest cache (e.g., the L1 cache, etc.). That is generally known as a “cache miss” and causes the fastest cache to request the data from the next memory tier (e.g., the L2 cache). This is costly to processor performance as a delay is incurred in determining that a cache miss has occurred, retrieving the data by the L1 cache, and providing it to the processor. Occasionally, the next tier of memory (e.g., the L2 cache, etc.) may not include the requested data and must request it from the next tier (e.g., main memory, etc.). This generally causes further delays.