Caches are commonly used to temporarily store values that might be repeatedly accessed by a processor, in order to speed up processing by avoiding the longer step of loading the values from main memory such as random access memory (RAM).
A cache has many “blocks” which individually store the various instructions and data values. The blocks in any cache are divided into groups of blocks called “sets.” A set is the collection of cache blocks that a given memory block can reside in. For any given memory block, there is a unique set in the cache that the block can be mapped into, according to preset mapping functions. The number of blocks in a set is referred to as the associatively of the cache, e.g., 2-way set associative means that, for any given memory block there are two blocks in the cache that the memory block, can be mapped into; however, several different blocks in main memory can be mapped to any given set. A 1-way set associative cache is direct mapped; that is, there is only one cache block that can contain a particular memory block. A cache is said to be fully associative if a memory block can occupy any cache block, i.e., there is one set, and the address tag is the full address of the memory block.
An exemplary cache line (block) includes an address-tag field, a state-bit field, an inclusivity-bit field, and a value field for storing the actual instruction or data. The state-bit field and inclusivity-bit field are used to maintain cache coherency in a multiprocessor computer system. The address tag is a subset of the full address of the corresponding memory block. A compare match of an incoming effective address with one of the tags within the address-tag field indicates a cache “hit.” The collection of all of the address tags in a cache (and sometimes the state-bit and inclusivity-bit fields) is referred to as a directory, and the collection of all of the value fields is the cache entry array.
When all of the blocks in a set for a given cache are full and that cache receives a request, with a different tag address, whether a “read” or “write,” to a memory location that maps into the full set, the cache must “evict” one of the blocks currently in the set. The cache chooses a block by one of a number of means known to those skilled in the art (least recently used (LRU), random, pseudo-LRU, etc.) to be evicted. If the data in the chosen block is modified, that data is written to the next lowest level in the memory hierarchy which may be another cache (in the case of the L1 or on-board cache) or main memory (in the case of an L2 cache, as depicted in the two-level architecture of FIG. 1). By the principle of inclusion, the lower level of the hierarchy will already have a block available to hold the written modified data. However, if the data in the chosen block is not modified, the block is simply abandoned and not written to the next lowest level in the hierarchy. This process of removing a block from one level of the hierarchy is known as an “eviction.” At the end of this process, the cache no longer holds a copy of the evicted block.
This ratio of available blocks for instruction versus data is not, however, always the most efficient usage of the cache for a particular procedure. Many software applications will perform better when run on a system with split I/D caching, while others perform better when run on a flat, unified cache (given the same total cache space). In the instances where the cache I/D ratio is not particularly close to the actual ratio of instruction and data cache operations, there are again a troubling number of evictions.
A cache replacement algorithm determines which cache block in a given set will be evicted. For example, an 8-way associative cache might use an LRU unit which examines a 7-bit field associated with the set.