A cache memory or cache may be used by a central processing unit (CPU) of a computing device to reduce the average time to access a main memory, which is typically slower and larger than internal memory used for cache. The cache may be mapped to (or use) a smaller, faster memory (e.g., on a microprocessor close to the CPU) which stores copies of program (e.g., instructions) and/or data of a software application from the most frequently used main memory locations (e.g., external memories). If the program code and/or data are stored in the cache, future use of the program code and/or data can be made by accessing the cached copy rather than by re-fetching or re-computing the original program code and/or data from the slower main memory. A cache is made up of a pool of entries. Each entry may include a datum (i.e., a cache line or cache block), which in different designs may range in size from “8” to “512” bytes or more. Each entry of the cache may include an index (i.e., a unique number used to refer to that entry), and a tag that contains an index of the datum in the main memory which has been cached.
If a cache client (e.g., a CPU, a web browser, an operating system, etc.) wishes to access a datum in a main memory, the cache client may first check the cache. If a cache entry can be found with a tag matching a tag of the desired datum, the datum in the cache entry may be used instead of the datum in the main memory. This situation may be referred to as a “cache hit.” If the cache is searched and does not contain the tag matching the tag of the desired datum, the situation may be referred to as a “cache miss.” Cache misses may slow the datum retrieval process because they involve transferring the desired datum from the main memory, which is much slower than the cache, and may typically involve writing the desired datum to the cache from the main memory (e.g., for future access) before it is delivered to the CPU.
Multiple caches may be implemented with a computing device (e.g., a microprocessor that includes a CPU). To distinguish between multiple caches, a level notation may be used in which the higher the level, the farther away the cache is from the CPU. For example, a level 1 (L1) cache may be part of the microprocessor (i.e., provided “on-chip”), and a level 2 (L2) cache may also be part of the microprocessor but may be further away from the microprocessor than the L1 cache. In some implementations, the size of the cache may increase as the level increases, but the speed of the higher level cache may be less than the speed of the lower level cache. In other words, the capacity of the higher level cache may be greater than the capacity of the lower level cache, but it may take longer to move datum in and out of the higher level cache (e.g., by the CPU) than the lower level cache. In other implementations, there may be three or more levels of caches, with multiple caches at each level (e.g., one cache for program and/or one cache for storing data), and/or caches at a certain level may be shared by multiple CPUs. A cache-enabled microprocessor may permit the internal memory to be used as cache, as main memory, and/or partitioned between cache and main memory (e.g., a uniquely addressable memory).
Caches may have a certain replacement policy that decides which cache line may be removed (i.e., evicted) from the cache in case an incoming entry from the main memory is to be placed in the cache. If the replacement policy is free to choose any entry in the cache to hold the incoming entry, the cache is called a “fully associative cache.” While a fully associative cache is very flexible, it is also very expensive to implement. At the other extreme, if each entry in the main memory can go in only one place in the cache, the cache is called a “direct mapped cache” (or “one-way set associative cache”). However, a direct mapped cache may cause what is known as “cache thrashing.” Cache thrashing may occur if two or more entries, that may be frequently needed, are both mapped to the same cache line. Each time one of the entries is written to the cache, it overwrites another needed entry mapped to the same cache line. This causes cache misses and impairs program code and/or data reuse.
Many caches implement a compromise referred to as an “n-way set associative cache.” In an n-way set associative cache, the cache may be broken into sets of cache lines. The CPU may select a particular set just as in direct mapping, and may use a fully associative mapping algorithm (e.g., a least recently used (LRU) algorithm) to select one of the “n” cache lines within the set for an incoming entry. Associativity is a tradeoff. Checking more places may take more power, area, and potentially time. On the other hand, caches with more associativity may suffer few cache misses.