The present invention relates to cache memory systems, and in particular to predictive accessing of a cache memory.
Small, quickly accessible cache memories have been used to improve the performance of computer systems. The two most prevalent types of cache memories are direct-mapped and set-associative cache memories. In addition, multiple levels of cache memories may be used, with a first level cache being on the same semiconductor chip as the microprocessor, and a second level cache memory being in separate chips, such as SRAM. These cache memories can be unified caches, containing both instructions and data, or separate instruction and data caches could be used. In addition, in systems using address translation, including paging, a small cache portion of the page table is typically used on the microprocessor chip and called a translation look-aside buffer (TLB).
There are four different types of caches. These are direct-mapped, fully associative, set-associative, and predictive set-associative. These are described briefly below.
FIG. 1 illustrates a typical prior art TLB 60 and a direct-mapped cache 62. If the cache is a physical address cache, it will be addressed by a physical address in a register 66. If it is a virtual address cache, it can be addressed directly by a virtual address. FIG. 1 shows a physical address cache. A virtual address in a register 64 has an offset portion and a page portion. The page portion is provided to a TLB 60, with the page being compared to tags in the TLB to identify whether it is present, and if it is present, providing a translated, physical page to a physical address register 66. Register 66 combines the translated page with the offset from the virtual address register 64. This address can then be provided to the cache memory. The TLB is itself a cache, which may be either direct-mapped, set-associate or another structure. If there is a miss in the TLB, the full page table may be accessed in external memory, which may be in a level 2 (L2) cache, for instance. The page tables may also be multiple level page tables, and may be combined with segmentation or other addressing schemes for partitioning the memory space.
In the example of FIG. 1, a direct-mapped cache is shown. The bits of the physical address are provided on a bus 88 to the address inputs of the cache, and directly select the one location where a cache entry may be located, using the least significant bits of the physical address. A tag with the more significant bits is compared to the more significant bits of the physical address in a comparator 70 to see if there is a hit (the address is in the cache). The cache entry is immediately available without waiting for the tag comparison. A miss indication can be used to later invalidate the instruction retrieved from the cache.
As can be seen, the direct-mapped cache always provides a particular range of addresses to a particular, physical area in the cache. This may result in an inefficient cache where the memory accesses for a particular program are concentrated within one or two ranges which would be mapped to the same area of the cache. The tradeoff is that the access time is faster since the select information is provided at the same time as the physical address.
FIG. 2 illustrates a four-way set-associative cache which could be used in place of the direct-mapped cache of FIG. 1. Note that a four-way cache was chosen as an illustrative example only, and different numbers could be used for an N-way cache. FIG. 2 also shows four sets, but each set has a separate entry portion 74 and tag portion 72. Data from a particular address range could be stored in any of the four sets, and is not restricted to one particular area of the cache as in a direct-mapped cache. Thus, where a particular program accesses data within a single address region which would be confined to a single area of the direct-mapped cache, that same data may be provided in any one of four cache sets, thus improving the chances of a hit. The tradeoff is that each of the tags must be compared (comparator 71) to the higher order bits on lines 76 from the physical address to determine which set is to be used. The comparator 71 output then selects the particular set from multiplexer 70. As can be seen, this makes the access time of the cache slower, since the comparison step must be done prior to enabling the output data through multiplexer 70. LRU information 67 is also stored. The above examples of FIGS. 1 and 2 are only examples of particular prior art implementations, and other physical configurations could be used to implement a direct-mapped or a set-associative cache.
A fully associative cache would allow an entry to be anywhere in the cache, not just a designated location in 4 different sets (or N different sets for an N-way set-associative cache). The disadvantage of a fully associate cache is that each tag must be compared to the address, not just N tags in an N-way set associative cache.
Another type of cache structure, which may be called a predictive set-associative cache, is described in U.S. Pat. No. 5,392,414. This cache is logically multi-way, but physically direct mapped. It is an N-way predictive set-associative cache with an accessing speed comparable to that of a direct-mapped cache. This was accomplished by including prediction information for each entry regarding which set the next entry would be located in. Thus, when an entry was retrieved, the prediction information would be stored in a latch, and the output of the latch would be used to select the set for the next access. If the prediction information was wrong, the access speed penalty for the tag comparison in the comparator as shown in FIG. 2 would need to be expended in those infrequent cases (a tag comparison is always done to confirm a hit, but does not hold up the access for a hit).
Many computer systems implement both a first level cache on the microprocessor chip itself, and a (often external) second level cache. In a typical implementation, the first level cache is a set-associative, direct mapped or fully associative cache, while the external cache is direct-mapped. The logic for controlling the second level cache may be on the microprocessor chip itself, or on a separate memory management chip. This logic would contain the multiplexer and comparison logic, for instance. As can be seen, the multiplexer and comparison logic of the set-associative cache of FIG. 2 requires a large number of data and address lines. For a one mega-byte cache, the number of lines required for data, addresses and control could be from 200-300 pins for each set of an N-way cache. Thus, a two-way cache could require 400 pins and a four-way cache could require 800 pins. For example, a 1 Mega-byte cache may require 15 bits of address, 64, 128 or even 256 bits of data, plus parity, and 25 or more bits for the tag (plus power and ground pins). Obviously, such a pin count is prohibitive with today's packaging technologies, and accordingly level-two caches of any significant size implemented today are done as direct-mapped caches. This is because the direct-mapped cache, as seen in FIG. 1, does not require all the address lines being provided to a comparator.