Most modern processors use a cache memory (or hierarchy of cache memories) to reduce average access times to memory and improve overall system performance. Cache memories take advantage of the principle of locality which says that the data most recently used is very likely to be accessed again in the near future. Modern dynamic random access memory (DRAM) (e.g., double data rate, DDR2 or DDR3) has many timing constraints which can limit the performance of the memory device. In particular, the row cycle time (tRC) imposes a minimum time between consecutive activations of the same memory bank. This timing parameter is significant because it limits the maximum frequency with which a single piece of data can be accessed. Today's DDR3 devices have a tRC of approximately 45 nanoseconds (ns).
A cache memory is a component that improves performance by transparently storing data such that future requests for that data can be served faster. The data that is stored within a cache memory might be values that have been computed earlier or duplicates of original values that are stored elsewhere. If requested data is contained in the cache memory, this request can be served by simply reading the cache memory, which is comparably faster. Otherwise, the data has to be recomputed or fetched from its original storage location (e.g., main memory), which is comparably slower. Hence, the more requests can be served from the cache the better the overall system performance is.
To be cost efficient and to enable an efficient lookup of data, cache memories are comparably small. Nevertheless, cache memories have proven extremely effective in many areas of computing because access patterns in typical computer applications have locality of reference. References exhibit temporal locality if data is requested again that has been recently requested already. References exhibit spatial locality if data is requested that is physically stored close to data that has been requested already.
Typically, each location (also referred to as a cache entry) in a cache memory contains data (also referred to as a cache line). The size of the cache line is usually larger than the size of the usual access requested by an instruction. Each location in the cache memory also has an index, which is a unique number used to refer to that location. The index for a location in main memory is called an address. Each location in the cache memory has a tag that contains the index of the datum in main memory that has been cached.
When a processor needs to read or write a location in main memory, it first checks whether that memory location is in the cache. This is accomplished by comparing the address of the memory location to all tags in the cache that might contain that address. If the processor finds that the memory location is in the cache, a cache hit has occurred; otherwise, there is a cache miss. In the case of a cache hit, the processor immediately reads or writes the data in the cache line.
In the case of a miss, the cache memory allocates a new entry, which comprises the tag just missed and a copy of the data. The reference can then be applied to the new entry just as in the case of a hit. Read misses delay execution because they require data to be transferred from a much slower main memory than the cache memory itself. Write misses may occur without such penalty since the data can be copied in background.
In order to make room for the new entry on a cache miss, the cache has to evict one of the existing entries. The heuristic that it uses to choose the entry to evict is called a replacement policy. One popular replacement policy replaces the least recently used (LRU) entry.
A cache memory can be a direct-mapped, 2-way, 4-way, or full associative cache memory. Associativity is a trade-off. If there are ten places the replacement policy can put a new cache entry, then when the cache is checked for a hit, all ten places must be searched. Checking more places takes more power, chip area, and potentially time. On the other hand, cache memories with more associativity suffer fewer misses, so that the processor spends less time servicing those misses.
In order for a cache memory to be effective, the memory access pattern must exhibit locality. In computer science, locality of reference, also known as the principle of locality, is the phenomenon of the same value or related storage locations being frequently accessed. There are two basic types of reference locality. Temporal locality refers to the reuse of specific data and/or resources within relatively small time durations. Spatial locality refers to the use of data elements within relatively close storage locations. Sequential locality, a special case of spatial locality, occurs when data elements are arranged and accessed linearly, e.g., traversing the elements in a one-dimensional array. Locality is merely one type of predictable behavior that occurs in computer systems. Systems which exhibit strong locality of reference phenomenon are good candidates for performance optimization through the use of techniques, like the cache memory and prefetching technology concerning the memory, or like the advanced branch predictor at the pipelining of processors.
Furthermore, a cache memory must be large enough to hold a significant amount of the working set of a workload. If not, cache thrashing can occur in which multiple main memory locations compete for the same cache lines, resulting in excessive cache misses. Designing a system without a cache memory is equally problematic. Because of the tRC of DDR3 memory, the same memory location may be accessed once every 45 ns. In certain systems, such as a network packet processing system or network processor, processing logic must process a new packet in less than 7 ns. A DDR3-based memory system may not have sufficient performance if greater than 1 in 6 packets require access to a single datum (e.g., greater than once every 42 ns). Thus, there is a balance between the size of a cache memory and the cost of the cache memory.