Most modern computer system architectures include a cache. That is, a level of the memory hierarchy between the processor and the main memory. Generally, access to data stored in a cache is much faster than the time needed to retrieve data from main memory. However, cache memory is typically more expensive and smaller than main memory. A common cache analogy is that main memory is like books on shelves in a library and a cache is like a small subset of those books on a desk. The widespread use of cache memory is a clear statement that it does increase system performance, more than enough to justify added complexity.
It is possible to have multiple levels of cache, or more correctly multiple levels of memory, in a memory hierarchy. Generally, in such a system, all the data is stored at the lowest level, which is typically the largest in size and the slowest to access. Data is then copied to the higher levels, which may decrease in size and increase in speed.
Designing a memory hierarchy for a general purpose computer, including algorithms for what data should be stored in cache is a complex process that has received a lot of attention over the years. Empirically, computer programs have a tendency to reuse data that has been accessed recently (temporal locality) and also to access data located near data that has been accessed recently (spatial locality). Many memory hierarchies utilize the existence of these localities for storing data in a cache. For example, keeping recently accessed data in a cache exploits temporal locality. Retrieving blocks of data, instead of an individual word, into a cache exploits spatial locality. This disclosure will use the term “data” to include both the traditional concept of data, such as alpha-numeric constants or variables, set or read by a computer program, as well as program instructions. For the memory hierarchy, both types of data are merely values stored at a particular location and the ultimate uses of the individual stored values are irrelevant. Some architectures do, however, maintain separate caches for traditional data and instructions, and must distinguish between them.
Ideally, whenever a processor calls for data from memory, that data will be found in the cache. However, cache misses do occur. Two common causes of cache misses are cold-start misses and capacity misses. A cold-start miss occurs when a processor first accesses a particular range of data addresses. A capacity miss occurs due to limited cache sizes, such as when previously cached data is “bumped” from the cache by newer data just before it is needed. A larger cache may decrease the capacity miss rate, but have no effect on the cold-start miss rate.