1. Technical Field
The present invention relates to computer memory systems; and in particular to a method and apparatus for reducing access latency in set-associative caches.
2. Discussion of Related Art
Cache memory is typically a small, high speed buffer located between the central processing unit (CPU) and main memory. The cache is used to temporarily hold those contents of main memory believed to be currently in use. Decisions regarding when to replace the contents of the cache are generally based on a least recently used (LRU) algorithm. The LRU algorithm causes cache memory locations to be replaced with the contents of main memory that were most recently used. Information in cache memory can be accessed in far less time than information in main memory. Thus, the CPU wastes less time waiting for instructions and/or operands to be fetched and/or stored in cache.
A direct-mapped cache limits the storage of the contents of any particular location in main memory to specific locations in cache. In contrast, an M-way set-associative cache maps the contents of each main memory location into any of M locations in cache. Essentially the M-way set-associative cache is a combination of M identical direct-mapped caches. However, access and retrieval from M-way set-associative caches is more complex. During every memory access to the M-way set-associative cache, each of the combination of M identical direct-mapped caches must be searched and the appropriate data selected and multiplexed to the output if there is a match. If a miss occurs, then a choice must be made between M possible cache lines as to which cache line must be deleted and rewritten with more recently used contents of main memory.
FIG. 1 illustrates a virtually-tagged 4-way set-associative cache memory of the prior art comprising a cache directory 10, a cache array 12, a directory mux 14 and an array mux 16. The cache directory 10 comprises virtual addresses for each corresponding location in the cache array 12. The cache array 12 stores the contents of the main memory location pointed to by the corresponding location or block in the cache directory 10. A set is defined as a column in the cache array 12 and the corresponding column in the cache directory 10. A congruence class is defined as a row in the cache array 12 and the corresponding row in the cache directory 10. A block or a location is defined as the intersection of a particular set (column) and a particular congruence class (row). A location or block comprises one or more bytes of data.
An address 18 supplied to the cache memory comprises a directory tag 20, a congruence class 22 and a block offset 24. The directory tag 20 is used to select the desired set (column) in the cache directory 10 via the directory mux 14. The congruence class tag 22 is used to select the desired congruence class (row) of both the cache directory 10 and the cache array 12. The block offset 24 is used to select the desired byte within the desired block or location. The output of the directory mux 14 is used to select the desired set (column) of the cache array 12 via the array mux 16.
The latency in accessing associative caches is higher than the latency in accessing direct-mapped caches due to the necessity of comparing the address against the tags stored across multiple sets of the cache directory 10. If a match occurs, the set associated with the matching tag is used to select output from the corresponding set of the cache array 12. The output of the cache array 12 is ultimately routed to registers and functional units. The so-called xe2x80x9clate select problemxe2x80x9d refers to the need for addresses to go through a cache directory 10 lookup and potentially address translation (if a physically-tagged cache is used) before the appropriate set of the cache array 12 can be selected. Thus, the late select problem adversely impacts latency in a set-associative cache.
Therefore, it would be advantageous if set selection information could be made available prior to searching the cache directory and translating the address.
Further details regarding caches can be found in the following references, which are hereby incorporated by reference:
1. U.S. Pat. No. 5,634,119 to Emma et al.
2. Chang, Sheldon S. L. Electrical and Computer Engineering III (1983).
3. Smith, Allan J. Cache Memoriesxe2x80x94ACM Computing Surveys Vol. 14 (1982).
4. Cekleov M. and Dubois M. Virtual-Address Cachesxe2x80x94IEEE Micro (1997).
In accordance with illustrative embodiments of the present invention, a method for reducing access latency in set-associative caches is provided wherein data is read from locations of a memory selectable through at least one selecting cache, the method comprising the steps of generating set selection information, and storing the set selection information in a location that enables the set selection information to be made available for retrieval of data from the memory prior to the arrival of memory select information from the selecting cache.
An apparatus for reducing access latency in set-associative caches comprising a storage for storing set selection information; an M-way set-associative cache receiving an address and outputting M-sets of data determined by the address; and a multiplexor for multiplexing one of set selection information and set associative address, wherein said set selection information is made available prior to said set associative address for accessing said data.
An apparatus for reducing power consumption of set-associative caches comprising a set selection storage for storing set selection information; an M-way set-associative cache comprising an array and a directory, the directory outputting a set-associative tag portion of an address to the array; and a multiplexer for multiplexing one of said tag portion of an address from said directory and said set selection information for outputting one set of said M-sets of data.
Further in accordance with the present invention, a method of increasing the access speed of a set-associative memory using data addresses is provided. The addresses comprise an offset portion, a congruence class index, and a tag portion. The set associative memory comprises an array and a directory. The array stores data, and is partitioned into a plurality of array congruence classes. The array congruence class is partitioned into array sets. The array set comprises a cache line. The cache line comprises a plurality of data. The directory is partitioned into a plurality of directory congruence classes. The directory congruence class is partitioned into directory sets, each comprising a directory entry. The directory entry comprises an address tag and other status information including valid bits, parity, etc. The directory is partitioned such that there is a one-to-one correspondence between the directory entries and the cache lines such that the address tags are associated with one of the cache lines.
Preferably, the method comprises the steps of accessing contents of sets of a single array congruence class using the congruence class index, the single array congruence class being specified by the congruence class index, accessing contents of sets of a single directory congruence class using the congruence class index, the single directory congruence class being specified by the congruence class index, generating set selection information, utilizing the set selection information to select the sets of the array congruence class, outputting the data from the cache line in the selected set; comparing the tag portion to the address tags of the selected sets of the directory congruence class, comparing the selected set to the set selection information if one of the address tags in the selected congruence class is equal to the tag portion of the address, outputting a first control signal to indicate that the access was unsuccessful, and that the data output from the cache line is invalid if none of the address tags in the selected congruence class is equal to the tag portion of the address, and outputting a second control signal to indicate that the data from the cache line is invalid if the selected set is not equal to the set selection information.
In further accordance with the present invention, an apparatus for reducing access time in a set-associative memory using data addresses is provided. The address comprises an offset portion, a congruence class index, and a tag portion. The set-associative memory comprises an array and a directory, wherein the array comprises data. The array is partitioned into a plurality of array congruence classes. The array congruence class is partitioned into array sets, and the array sets determine set-associativity of the set-associative memory. The array set comprises a cache line, and the cache line comprising a plurality of data. The directory is partitioned into a plurality of directory congruence classes. The directory congruence class is partitioned into directory sets. The directory set comprises a directory entry, and the directory entry comprises an address tag. The directory being partitioned such that there is a one-to-one correspondence between the directory entries and the cache lines such that the address tags in the directory are associated with at least one of the cache lines. The apparatus for reducing access time in the set-associative memory comprises means for accessing contents of sets of the array congruence class, the single array congruence class being that congruence class specified by the congruence class index, means for accessing contents of sets of a single directory congruence class, the single directory congruence class being that congruence class specified by the congruence class index, means for generating set selection information, means for selecting one of the sets of the single array congruence class using the set selection information, means for outputting the data from the cache line in the selected set, means for comparing the tag portion of the address to the address tags from the sets of the selected single directory congruence class, means for comparing the set comprising the address tag equal to the set selection information, and means for outputting a control signal indicating success of the data access, and validity of the data output from the cache line.