This invention relates to computer operation, and more particularly to a method of increasing the speed of tag comparison in the operation of a set-associative cache.
The performance of CPUs has far outstripped memory system performance, as the development of computer hardware technology continues. Caches, which are small, fast memories a fraction of the size of main memory, are used to decrease the effective memory access times. A cache stores a copy of recently-used data for rapid access should the data be needed again, and, for many programs, a cache greatly improves the processing speed of the computer system.
In selecting a cache construction for a computer, an important decision is whether to use a direct-mapped or an associative cache. If a data item (having a given address) has only one place it can appear in the cache, the cache is "direct mapped." If the item can be placed anywhere in the cache, the cache is "fully associative." If the item can be placed in only a fixed number (a set) of places in the cache, the cache is "set associative." If there are n locations in a set, the cache is "n-way set associative." Typically, two-way or four-way (and sometimes eight-way) are the number of locations per set used in this type of cache. An associative cache has a lower miss rate, but can introduce a performance penalty.
A direct-mapped cache is the simplest and fastest construction, but severely limits the number of cache locations where a particular data item can reside. A direct-mapped cache has only one location where a data item of a given index address may be stored. When two or more heavily used data items map to the same cache location in a direct-mapped cache, and these data items are used by a program in a cyclic manner, as in a loop, cache thrashing occurs. As each data is used, it displaces its predecessor, causing a relatively slow main memory access. Cache thrashing can severely degrade program run times by forcing many main memory accesses, and it also increases the system interconnect bandwidth required by a system to obtain reasonable performance.
Set-associative cache constructions increase the probability of finding recently-used data in the cache. By providing two or more location in cache where a data item having a given index (low-order address) may be stored, cache misses are reduced and thrashing will not be as frequent. Set-associative caches are inherently slower in operation than direct-mapped caches, however. Since a set associative cache allows a given address value to be mapped to more than one location in a cache array, two or more addresses from different pages having a corresponding set of address bits (the index) equal can exist in the cache at the same time. The hardware for implementing a direct-mapped cache is faster in operation, however, compared to a set-associative cache, because data can be driven onto the bus while a hit or miss is being determined, as opposed to a set associative cache where the correct data cannot be driven onto the bus unless a hit is determined. In a set-associative cache, the tag address must be compared with a number of possible matches, and then the corresponding data from two or more banks also selected (after the tag compare is completed); this additional step of bank selection necessarily makes the operation slower. Indeed, to allow for this added time, the cycle time of the CPU may be slightly increased, affecting the performance of the entire system.
High performance computer systems usually employ virtual memory management, which introduces a delay in memory addressing while the virtual address is translated to a physical address. An on-chip cache for a microprocessor chip is thus constrained in its response time by the requirement for translating the addresses, usually by an on-chip translation buffer, before the cache can be accessed to see if it contains the data of a memory reference. Waiting to begin the tag compare until after the translation buffer has produced a page frame address in a virtual memory management system thus places further performance demands upon the compare and bank select operation in a set-associative cache.
Various circuit improvements have been used to speed up the compare operation in set-associative caches to attempt to keep pace with the improvements in CPU cycle times. The goal is to be able to access the primary cache (in a hierarchical memory) in one CPU cycle, and, in case of a cache miss in the primary cache, complete the access to the secondary cache in one or two cycles. This requires the miss to be detected at as early as possible in the access cycle, i.e., as soon as possible after the tag becomes available. The technology of semiconductor manufacturing and circuit design is reaching the level where 100-Mhz (and above) clocks are possible, providing 10-nsec CPU cycle times.
In an article by Suzuki et al, "A 19 ns Memory," ISSCC 87, p. 134, a tag memory for a high-speed SRAM cache is disclosed which employs a comparator with CMOS match circuits and an NMOS NOR gate for generating a hit signal. The Suzuki et al compare circuit, however, does not provide an early bank select signal, and the hit signal is data dependent, i.e., its timing is different depending upon the number of matched bits in the tag address. Further, the Suzuki et al circuit, if this were to be used for an associative cache, must be conditioned to a clock edge since the hit signal is asserted to begin with. Also, since the Suzuki et al cache is direct mapped, no provision is made for detecting multiple hits. A cache having multiple-hit detection is shown in an article by Ooi et al, "Fail-Soft Circuit Design in a Cache Memory Control LSI," ISSCC 87, p. 103, but this circuit requires a separate logic unit to provide this function, and introduces a further delay in the compare operation.