Most computer systems employ a multilevel hierarchy of memory systems, with fast but limited capacity memory at the highest level of the hierarchy and proceeding to slower but higher capacity memory at the lowest level of the hierarchy. Typically, the hierarchy includes a small fast memory called a cache, either physically integrated within a processor integrated circuit or mounted physically close to the processor for speed. There may be separate instruction caches and data caches. There may be multiple levels of caches.
If a processor requests an item from a cache and the item is present in the cache, the event is called a cache hit. If a processor requests an item from a cache and the item is not present in the cache, the event is called a cache miss. In the event of a cache miss, the requested item is retrieved from a lower level of the memory hierarchy. In many processor designs, the time required to access an item for a cache hit is one of the primary limiters for the clock rate of the processor. Therefore, optimization of cache hit timing is critical for performance. There is an ongoing need for improvement in cache hit times for computer processors.
The minimum amount of memory that can be transferred into or out of a cache is called a line, or sometimes a block. Typically, memory is organized into words (for example, 32 bits per word) and a line is typically multiple words (for example, 16 words per line). Memory may also be divided into pages, with many lines per page.
If a cache stores an entire line address along with the data, any data line can be placed anywhere in the cache. If a line can be placed anywhere in the cache, the cache is said to be fully associative. As a space saving alternative, the least significant bits of an address may be used to designate one specific location within the cache, with the remaining set of more significant bits of each address stored along with the data. The number used to designate a location within a cache is commonly called an index and the remaining set of bits required to define a physical address for a line is commonly called a tag. In a cache with indexing, a line with a particular address can be placed only at the one place within the cache designated by the index. If a line can appear at only one place in the cache, the cache is said to be direct mapped (and is said to have low associativity). In an alternative design, a cache may be organized into sets, each set containing two or more lines. If a line can be placed in only one of the sets, the cache is said to be set associative.
With direct mapping, when a line is requested, only one line in the cache has matching index bits. Therefore, the data can be retrieved immediately and driven onto a data bus before the system determines whether the rest of the address matches. The data may or may not be valid, but in the usual case where it is valid, the data bits are available on a bus before the system determines validity. With associative caches, it is not known which line corresponds to an address until the full address is compared. That is, in associative caches, the result of tag comparison is used to select which data bits are presented to the processor. In associative caches, the longest task time is often the time required to compare the tag of the requested line to the tags of the lines within the cache and this tag comparison time must be in series with at least part of data selection time. There is a need for improvement in the overall time for tag comparison and data selection for associative caches.