A processor is a digital device that executes instructions specified by a computer program. A typical computer system includes a processor coupled to a system memory that stores program instructions and data to be processed by the program instructions. The performance of such a system is hindered by the fact that the time required to read data from the system memory into the processor or to write data from the processor to the system memory is typically much larger than the time required for the processor to execute the instructions that process the data. The time difference is often between one and two orders of magnitude. Thus, the processor may be sitting idle with nothing to do while waiting for the memory to be read or written.
However, processor designers recognized long ago that programs tend to access a relatively small proportion of the data a relatively large proportion of the time, such as frequently accessed program variables. Programs with this characteristic are said to display good temporal locality, and the propensity for this characteristic is referred to as the locality of reference principle. To take advantage of this principle, modern processors typically include one or more cache memories. A cache memory, or cache, is a small memory relative to system memory size and electrically close to the processor core that temporarily stores a subset of data that normally resides in the larger, more distant memories of the computer system, such as the system memory. Caching data is storing data in a storage element of a cache memory so that the data can be subsequently more quickly provided from the cache memory than from a more distant memory of the system.
When the processor executes a memory read instruction, such as a load or pop instruction, the processor first checks to see if the requested data is present in the cache, i.e., if the memory read address hits in the cache. If not, i.e., if the memory read address misses in the cache, the processor fetches the data into the cache, typically in addition to loading it into the specified register of the processor. Now since the data is present in the cache, the next time a memory read instruction is encountered that requests the same data, the data can be fetched from the cache into the register for processing, rather than from system memory. The memory read instruction can be executed essentially immediately since the data is already present in the cache.
A cache stores data in cache lines, or cache blocks. A cache line is the smallest unit of data than can be transferred between the cache and the system memory. An example of a cache line size is 64 bytes of data. When a memory read instruction causes a cache miss, an entire cache line implicated by the missing address is fetched into the cache, instead of only fetching the data requested by the memory read instruction. Consequently, subsequent memory read instructions that request data in the same cache line may be quickly executed because the data can be supplied from the cache rather than having to access system memory.
In addition, when a memory write instruction is executed, such as a store or push instruction, if the memory write address hits in the cache, the data may be immediately written into the cache line of the cache, thereby allowing the write of the data to system memory to be deferred. Later, the cache will write the cache line to system memory, typically in order to make room for a newer cache line. This operation is commonly referred to as a writeback operation. Still further, some caches also allocate an entry in the cache when a memory write address misses in the cache. That is, the cache performs a writeback operation of an old cache line in an entry of the cache, and reads the new cache line implicated by the write address from system memory into the cache entry formerly occupied by the old cache line. This operation is commonly referred to as a write allocate operation.
As may be observed, an efficiently performing cache may greatly improve the performance of the processor. Additionally, in many cases, cache memories have come to represent a significant proportion of the power consumption of a processor, particularly when the cache memory is large, such as a last-level cache memory in the cache memory hierarchy of the processor. It is also desirable to reduce the amount of power consumed by the cache memory, which is often a competing goal with the goal of improving the performance of the processor.