Users of data processing systems such as computers and the like continue to demand greater and greater performance from such systems for handling increasingly complex and difficult tasks. In addition, processing speed has increased much more quickly than that of main memory. As a result, cache memories, or caches, are often used in many such systems to increase performance in a relatively cost-effective manner.
A cache is typically a relatively faster memory that is coupled intermediate one or more processors and a relatively slower memory such as implemented in volatile or non-volatile memory devices, mass storage devices, and/or external network storage devices, among others. A cache speeds access by maintaining a copy of the information stored at selected memory addresses so that access requests to the selected memory addresses by a processor are handled by the cache. Whenever a access request is received for a memory address not stored in the cache, the cache typically retrieves the information from the memory and forwards the information to the processor. Moreover, if the cache is full, typically the information related to the least recently used memory addresses is returned to the memory to make room for information related to more recently accessed memory addresses.
The benefits of a cache are maximized whenever the number of access requests to cached memory addresses, known as "cache hits", are maximized relative to the number of access requests to non-cached memory addresses, known as "cache misses". Despite the added overhead that typically occurs as a result of a cache miss, as long as the percentage of cache hits is high, the overall access rate for the system is increased. It is well known that the vast majority of successive memory access requests refer to a relatively small address area, and thus, cache hit rates of well over 95% are common.
A cache directory is typically utilized by a cache controller to access cache lines that store information from given ranges of memory addresses. Such ranges of memory addresses in memory are typically mapped into one of a plurality of sets in a cache, where each set includes a cache directory entry and cache line referred to thereby. In addition, a tag stored in the cache directory entry for a set is used to determine whether there is a cache hit or miss for that set--that is, to verify whether the cache line in the set to which a particular memory address is mapped contains the information corresponding to that memory address.
Typically, both the set and the tag are derived directly from the memory address to reduce access time, as a hardwired mapping of specific address lines to the set and tag may be performed relatively quickly. For example, assuming a memory space of 2.sup.n memory addresses, and thus [n-1:0] address lines, a cache line size of 2.sup.m bytes, and 2.sup.p sets, one common cache design maps the m lowest order address lines to a byte select for a given cache line, the next p lowest order address lines to the set, and the remaining address lines to the tag. The tag also includes the bits from the p set lines as well.
Caches may have different degrees of associativity, and are often referred to as being N-way set associative. Each "way" or class represents a separate directory entry and cache line for a given set in the cache directory.
Therefore, in a one-way set associative cache, each memory address is mapped to one directory entry and one cache line in the cache. However, this type of cache is typically prone to "hot spots" where multiple memory addresses from different cache pages that are accessed relatively frequently are mapped to the same directory entry in the cache, resulting in frequent cache misses and lower performance.
Multi-way set associative caches, e.g., four-way set associative caches, provide multiple directory entries and cache lines to which a particular memory address may be mapped, thereby decreasing the potential for performance-limiting hot spots. However, when each set includes multiple directory entries, additional processing time is typically required to determine which, if any, of the multiple directory entries in the set references that memory address. Typically, this is performed by either sequentially or concurrently comparing the tag for a given memory address with the tag for each directory entry in the set, and then accessing the cache line referred to by the matching directory entry if and when a match is found. Therefore, while hot spots are reduced in a conventional multi-way set associative cache, the performance gains are at least partially offset by the additional comparison step or steps required to determine the correct directory entry in a set.
Cache performance is also improved by increasing the size of the cache. However, cache memory is often relatively expensive, and oftentimes is limited by design constraints--particularly if the cache is integrated with a processor on the same integrated circuit device. Internal caches integrated with a processor are typically faster than external caches implemented in separate circuitry. On the other hand, due to design and cost restraints, internal caches are typically much smaller in size than their external counterparts. If a set associative cache is internal then all sets may often be accessed in parallel; however, this is often not possible with external set associative caches. In addition, internal multi-way set associative caches are often limited in size by the area of the chip.
One cost-effective alternative is to chain together multiple caches of varying speeds, with a relatively smaller, but faster primary cache chained to a relatively larger, but slower secondary cache. For example, some microprocessors implement a relatively small internal level one (L1) cache with an additional internal or external level two (L2) cache coupled intermediate the L1 cache and main memory storage.
In addition, some computer system designs utilize virtual addressing, and thus require address translation of a memory access request from a virtual addressing format to a real addressing format for access to a main memory. Moreover, the design of virtual addressed caches is often more complex than counterpart real addressed caches, and thus, real addressing is often used for many cache designs. Therefore, an additional step of translating a memory access request from virtual addressing to real addressing is also required for many cache accesses.
It may be possible to utilize class prediction algorithms to attempt to predict a correct class mapping for a particular memory address. For example, some conventional designs utilize a history array accessed by virtual address bits to control the late select of an internal level one cache without penalizing the cycle time of the cache. However, these designs are not well suited for external caches as they would require additional cycles, additional pins on the processor chip, and/or additional custom external arrays interfaced with the processor chip.
Consequently, a significant need continues to exist in the art for a cache design capable of increasing system performance in a data processing system. Specifically, a significant need continues to exist for a cost-effective cache design exhibiting greater hit rates and reduced access times relative to conventional designs.