In a computer system, the speed at which the central processor unit (CPU) operates depends upon the rate at which instructions and operands (data) are transferred between memory and the CPU. This is particularly true for computer systems that use multiple pages to increase the amount of addressable memory. In an attempt to improve the data transfer rate between memory and the CPU, many modem computer systems include a memory buffer cache.
A cache is a relatively small, random access memory (RAM) used to store a copy of memory data in anticipation of future use by the CPU. A cache may be implemented by one or more dynamic RAM (DRAM) integrated circuits. For very high speed caches, the RAM is usually an integral part of the CPU chip. The data stored in a cache can be transferred to the CPU in substantially less time than data stored in memory. The utility of a cache arises from the fact that a cache can take advantage of the principles of locality of reference, which are well known in computing techniques. These principles indicate that when data stored at one location are accessed, there is a high probability that data stored at physically adjacent locations (spatial locality) will be accessed soon afterwards in time (temporal locality).
Thus, a cache is typically organized into a plurality of "blocks," wherein each block stores a copy of one or more contiguously addressable bytes of memory data. That is, access to memory data causes an entire block of data, including the referenced data, to be transferred from memory to cache, unless of course the data are already stored in the cache.
During operation of the computer system, when the CPU makes a memory reference, a determination is made if a copy of the referenced data are also stored in the cache. This is known as a "hit." If the data are not stored in cache, this is known as a "miss." The hit or miss rate is an indicator of the effectiveness of the cache.
In order to access data in the cache, the memory address is translated to a cache address. The portion of the cache address including the most significant bits of the memory address is called the "tag" and the portion including the least significant bits is called the "cache index." The cache index corresponds to the address of the block storing a copy of the referenced data, additional bits are usually also used to address the bytes within a block, that is, if each block has more than one byte of data. The tag is used to uniquely identify blocks having different memory addresses but the same cache index. Therefore, the cache typically includes a data store and a tag store. The data store is used for storing the blocks of data. The tag store, sometimes known as the directory, is used for storing the tags of each of the blocks of data. Both the data store and the tag store are accessed by the cache index. The output of the data store is a block of data, and the output of the tag store is a tag.
Since the cache address is directly computable from the memory address, such a cache is generally known as a direct-mapped cache. A key attribute of a direct-mapped cache is the short latency time in accessing data stored in the cache. However, in a direct-mapped cache, any attempt to store different blocks of data at the same cache index leads to "thrashing." Thrashing occurs when the CPU successively stores data having different memory addresses as blocks having the same cache index, essentially negating the beneficial effect of the cache, and reducing the operating speed of the computer. Thrashing is a well known phenomena in computer systems, typically due to unavoidable "hot spots" in memory which are referenced at a very high frequency compared to the rest of memory.
To increase the hit rate of the cache, and to reduce thrashing, it is well known to use multi-way set-associative mapping techniques wherein two or more concurrently addressable RAMs provide a plurality of blocks and tags for a single cache index. That is, in a conventional multi-way set-associative cache, the single cache index is used to concurrently access a plurality of blocks and tags in a set of RAMs. The number of RAMs in the set indicates the "way" number of a cache. For example, if the cache index is used to concurrently access data and tags stored in two RAMs, the cache is a two-way set-associative cache. Similarly, if the cache index is used to concurrently access data and tags stored in four RAMs, the cache is a four-way set-associative cache.
During the operation of a single-index multi-way set-associative cache, a memory access by the CPU causes each of the RAMs to be examined at the corresponding cache index location. The tag is used to distinguish the cache blocks having the same cache index but different memory addresses. If a tag comparison indicates that the desired data are stored in a cache block of one of the RAMs, that RAM is selected and the desired access is completed.
In case of a miss, a determination is made to select one of the blocks for replacement. Methods used for implementing a replacement strategy for data in a cache are well known in cache design. Typically, the replacement of blocks in a cache are done in a least recently used manner (LRU). LRU algorithms can be implemented in any number of ways. In general, an LRU algorithm selects particular blocks for replacement in aged order. That is, blocks storing data which were least recently used (LRU) are selected for replacement before blocks storing data which were most recently used (MRU). Used meaning any access, read or write to any data stored in the block. If the data in the block has been modified, that is, the data in cache is different than the copy of the data in memory, the block to be replaced is first written back to memory, before being overwritten by new data. An alternative known method uses a not most recently used (NMRU) algorithm. With an NMRU replacement strategy, the block which is selected for replacement is a block randomly selected from any block which was not most recently used.
A multi-way set-associative cache provides the advantage that there are two or more possible locations for storing data in blocks having the same cache index. This arrangement reduces thrashing due to hot spots in memory, and increases the operating speed of the computer system, presuming that hot spots are uniformly distributed over the blocks of the RAMs.
However, if hot spots are not uniformly distributed, thrashing may persist. For example, a CPU realizes improved operating speed if related data structures are page aligned. Most modem compilers start assigning related data structures, such as instruction sequences, beginning anew with each page. Also, if there is not enough room at the end of a page to store a entire data structure, the end of the page is left unused, rather than having a data structure split between two pages. Therefore, the low order addresses of a page may be disposed to be accessed at a higher rate than the high order addresses of the page. This biased distribution of memory hot spots will lead to thrashing in a traditional single-index multi-way set-associative cache. In addition, the random distribution of data and instructions may spuriously generate memory hot spots.