Cache memories are small, relatively high speed buffers used in modern data processing systems to temporarily store those portions of main memory which are currently in use. In a such a system, each main memory address specified for access is first passed to the cache. If the cache is currently assigned to hold the contents of that address, a "cache hit" occurs and the cache is enabled to complete the access. If this is not the case, a "cache miss" has occurred, and the main memory must be enabled. Typically, the cache can be read- and write-accessed on the order of ten times faster than the main memory. A central processing unit (CPU) associated with data processing system having a cache thus needs to spend far less time waiting for instructions and operands to be read or written.
When a miss occurs, the cache typically assigns the requested miss address to itself, and thereby displaces an old cache address which has not been accessed for some time. This has the effect of reassigning the old address to the main memory. Because of the highly local, repetitive nature of memory references in the vast majority of data processing applications, the use of such a replacement algorithm has been repeatedly proven to result in cache misses five percent or less of the time.
Rather than being directly addressable by the CPU, an ideal cache operates as a transparent buffer. Thus, in order to locate the contents of an address assigned to the cache, a mechanism is necessary for mapping each main memory address into a cache location. One commonly used scheme is to arrange the cache into so-called tag elements and associated data elements. Each tag element corresponds to one of the main memory addresses, and its associated data element represents the contents of that main memory location.
The portion of the cache which stores the tag elements is referred to as the tag array, while that which stores the data elements is called the data array. Rather than search the tag array sequentially to match the input main memory address, which would be quite slow, most contemporary caches use a portion of the requested main memory address as an index to look up the corresponding tag. In one such approach, called direct mapping, the index is directly fed to the address inputs of the tag array. If the tag element fetched matches the main memory address, a hit occurs, and the corresponding data element is enabled for access.
This approach does not maximize the probability of finding the data element associated with a given input index, however. Most caches thus perform what is called a set-associative tag search. In a set-associative cache, a portion of the main memory address is used as the index to address the tag and data arrays, as before. However, the tag and data arrays are arranged so that multiple tags and corresponding data elements are fetched and enabled for each access.
More particularly, for a set-associative cache, indexing is accomplished by dividing the input main memory address into three fields, a tag field which usually occupies the high order bits, an index field which occupies the middle order bits, and an optional byte field which consists of the remaining bits. The index field is used to select one set of tag elements and their associated data elements. The selected tag elements are then compared against the tag field of the input main memory address. If there is a match, a hit occurs. The byte field can be used to select a desired one (or perhaps some sub-unit, such as a single byte) of the data elements associated with the matched tag element.
The maximum allowable number of tags associated with a particular index is called the set size. A cache which retrieves two tag elements per index is said to be two-way set associative, one with four tag elements per index four-way set associative, and so forth.
Some implementations of set-associative caches have the tag and data arrays fabricated on the same custom semiconductor integrated circuit chip. Because critical components can then be placed as close together as possible, this reduces the delay between the time the tag array contents are matched and selection of the desired one of the data elements. However, such an arrangement does not lend itself to easy expansion of the set size.
Another approach is to use separate chips for the tag and data arrays. This works quite well for a direct mapped cache, since the index field can be used to address the tag and data arrays in parallel. While this approach allows expansion of the set size without too much difficulty, it unfortunately requires the results of the tag comparison to be available before the data array access can begin. The cache access time is the sum of the tag array access time, tag compare time, and data array access time using this approach, which is significantly slower than it would be for a direct-mapped cache.
Thus a set-associative cache having an easily expandable set size while retaining a cache access time on the order of a direct-mapped cache is desirable.
In certain applications, such as for multi-tasking processors with virtual addressing, it is also desirable to support dynamic selection of the number of data sets.