As it is known in the art, computer processing systems include a central processing unit which operates on data stored in a memory. Increased computer processing performance is often achieved by including a smaller, faster memory, called a cache, between the central processing unit and the memory for temporary storage of the memory data. The cache reduces the delay associated with memory access by storing subsets of the memory data that can be quickly read and modified by the central processing unit.
Because computer processes commonly reference memory data in contiguous address space, data is generally obtained from memory in blocks. There are a variety of methods used to map blocks of data from memory into the cache. Two typical cache arrangements include direct mapped caches and set associative caches.
In a conventional direct mapped cache, a block of data from memory is mapped into the cache using the lower bits of the memory address. The lower bits of the memory address are generally called the cache index. The upper bits of the memory address of the data block are generally called the `tag` of the block. A tag store, having a number of locations equivalent to the number of blocks in the cache, is used to store the tag of each block of data in the cache.
When a processor requires data from the cache it addresses the cache and the tag store and compares the received tag to the upper bits of the memory address of the required data. If the data is not in the cache, the tag does not match the upper address bits and there is a `miss` in the cache. When there is a `miss`, a memory read is performed to fill the cache with the required data. It is desirable to minimize the number of cache misses in order to avoid the latency incurred by the resulting memory reference.
Direct mapped caches are advantageous because they provide a cache system with minimal address complexity. Because the addressing scheme is straightforward, the cache is able to quickly return data to the central processing unit. However, one drawback of direct mapped caches is that since there is only one possible location in the cache for data having a common cache index, data may be constantly swapped in and out as memory data having a common cache index is needed by the processor. Such a situation is referred to as `thrashing` and results in a high miss rate and reduced system performance.
Set associative caches serve to reduce the amount of misses by providing multiple cache locations for memory data having a common cache index. In set-associative caching, the cache is subdivided into a plurality of `sets`. Each set has an associated tag store for storing the tags of the blocks of data stored in the set. As in direct mapped caching, the location of a particular item within the cache is identified by a cache index usually derived from the lower bits of the memory address.
When the processor wants to fetch data from the cache, the cache index is used to address each of the sets and their associated tag stores. Each set outputs a data item located at the cache index and the data items are generally input to a large multiplexer. The associated tags are each compared against the upper bits of the main memory address to determine if any data item provided by the sets is the required data item. Assuming that the data item to be fetched is in one of the sets of cache, the tag that is output by the tag store associated with the set matches the upper bits of the memory address. Depending on which tag matched, the appropriate select is provided to the multiplexer and the required data is returned to the processor.
Set-associative cache mapping thus provides improved performance over a direct mapped cache by reducing the frequency of cache misses. However, the amount of time required to perform the set comparison makes the set-associative cache memory system have a longer latency compared to the direct mapped cache system.
Multi-probe caches incorporate the advantages of both set associative cache design and direct mapped cache design. Multi-probe caches reduce the probability of thrashing in a cache by providing multiple locations in the cache where data may be stored. A multi-probe cache uses a direct mapped structure that is accessed sequentially with different addresses until the data is located, where each address in the sequence is generated by applying a hashing function to the previous address of the sequence.
The read of a multi-probe cache operates as follows. First the cache is accessed using the cache address of the required data. If the data is not at that cache location, then a hashing function is applied to the cache address, and a second, hashed address is provided. The cache is then accessed using the hashed address. If the data is present in the location of the hashed address, a hit occurs and the data is transmitted to the processor. Only if the second access is also unsuccessful does data need to be retrieved from memory. If there is a match between the read address and the tag at the hashed cache location, the data from the hashed cache location is swapped with the data at the first cache location to provide faster access to the most recently used data on the next lookup.
Thus the multi-probe caches emulate the retention capabilities of set-associative caches, with access time similar to those of a direct mapped cache. However, one drawback of the multi-probe cache is that a second cache lookup at a rehash address is attempted after every miss at the first cache address. Where there is a miss at both the first and second cache addresses, the hash-rehash technique replaces potentially useful data at the hashed location. As a result, secondary thrashing occurs due to an increase in the number of memory lookups required to restore data which is overwritten by rehashed data. Secondary thrashing may consequently reduce the performance of hash-rehash caches to below that of direct mapped caches.
One method of reducing the probability of secondary thrashing is described in U.S. patent application "Method and Apparatus for Serialized Set-Prediction", Ser. No. 08/668,316, filed Jun. 26, 1996 (pending) by Macri, et al. (hereinafter referred to as the Macri patent). In Macri, a direct-mapped cache memory was partitioned into a number of banks with each of the banks being addressable by a bank index, where the bank index was a portion of the memory address for the corresponding cache location. A prediction memory, having preferably a larger number of storage locations relative to the number of cache locations, was addressed prior to the cache lookup. Stored in each location of the prediction memory is some portion of the upper address bits of the memory address of the corresponding cache location. When addressed, the prediction memory provided a predicted bank index, which was appended to the incoming cache address to predict the correct bank in which the data was located.
Thus the Macri design provided a cache with direct-mapped cache access capabilities was provided that additionally allowed for storage of more than one item memory data having a common cache index. In addition, the use of the prediction store increased probability of locating the correct cache data on the first access.
One drawback of the above mechanism is that a tradeoff had to be made between the accuracy of the prediction and the size of the prediction store. A large prediction store would operate effectively as an advanced tag lookup, and although it would provide the exact cache location or a miss notification on first access, the delay associated with accessing an entire tag makes the design unfeasible. In addition, because the predication was based solely on the memory address of the stored data, certain attributes of the stored data were ignored, resulting in a higher miss rate than necessary for that data type.
It would be desirable to provide an improved prediction mechanism which would have increased accuracy with minimal increase to hardware and propagation delay.