1. Field of the Invention
This invention relates to the field of computer systems, and in particular to an n-way associative cache that uses a deterministic way selection.
2. Description of Related Art
Cache systems are commonly used to reduce the effective delay associated with access to relatively slow memory devices. When a processor requests access to a particular data item in the slower memory, the cache system loads the requested data item into a higher speed memory. Thereafter, subsequent accesses to this same data item are provided via the higher speed memory, thereby avoiding the delay associated with the slower memory. Generally, a xe2x80x9clinexe2x80x9d of data items that contains the requested data item is loaded from the slower memory into the higher speed memory when the data item is requested, so that any data item within the loaded line can be subsequently provided by the higher speed memory.
The effectiveness of a cache memory access system is provided by the likelihood that future data accesses are related to prior data accesses. Generally, the likelihood of a requested data item being contained in the same line of cache as a prior requested data item is substantially higher than zero, and therefore the likelihood of satisfying the request from the higher speed cache memory is correspondingly substantially higher than zero.
Higher speed memory is more costly than slower speed memory, and therefore the amount of available cache memory is generally limited. Cache management schemes are used to determine which data items to remove from the higher speed memory when a new line of data needs to be loaded into the higher speed memory. A commonly used prioritization scheme for retaining data items in the higher speed memory is a xe2x80x9cleast recently usedxe2x80x9d (LRU) criteria, wherein the line of the least recently used (i.e. xe2x80x9colderxe2x80x9d) memory access is replaced by the new line, thereby retaining recently used/accessed data items. Other criteria, such as xe2x80x9cmost often usedxe2x80x9d, may also be used, typically in conjunction with the LRU prioritization scheme.
Associative caches are commonly used to store lines of data items based upon a subset of the address of the requested item. FIG. 1 illustrates a conventional addressing scheme for an associative cache system 100. An address 110, typically from a processor 180, and discussed further below, is logically partitioned into a tag field 111, an index field 112, and a word field 113. The index field 112 provides an index to an associated set of cache lines in a cache memory 120. Each cache line of the set corresponds to a xe2x80x9cwayxe2x80x9d, or xe2x80x9csectionxe2x80x9d of the memory 120, and the cache memory 120 is termed an xe2x80x9cn-way associative cachexe2x80x9d. The size of the word field 113, j, corresponds to the size of a data line, 2j. That is, if there are sixteen words per data line, then the size of the word field 113 will be four-bits; if there are sixty four words per data line, then the word field 113 will be six-bits wide. Using this power-of-two relationship between the word field 113 and the size of the data line, the tag and index fields uniquely identify each data line in the memory.
When an addressed data item is loaded into the cache memory 120 from a slower memory (not shown), the line of data containing the data item is placed in a select way, the index field defining the location in the selected way for placing the data line. The selection of the way is effected using one of a variety of commonly available algorithms, such as the aforementioned LRU prioritization scheme. When the addressed data item is stored in a particular line area DLine-a, DLine-b, etc. in the cache 120, the tag field 111 is also stored, as illustrated by fields Tag-a 121a, Tag-b 121b, etc. in FIG. 1. The stored tag field 121, in combination with the data line""s location within the way, corresponding to the data line""s index field 112, uniquely identifies the data line that is stored in the cache 120.
Before an addressed data item is loaded into the cache 120, the cache 120 is checked to determine whether the data item is already located in the cache 120, to potentially avoid having to load the data item from the slower memory. The addressed data item may be located in the cache due to a prior access to this data item, or, due to a prior access to a data item within the same line of data DLine-a, DLine-b, etc. as the currently addressed data item. The index field 112 defines the set of n-lines in the cache 120 that are associated with this address. Each of the stored tags 121a, 121b, etc. corresponding to each of the stored lines 125a, 125b, etc. in the associated set is compared to the tag field 111 of the addressed data item, via the comparators 130a, 130b, etc. While this comparison is being made, each of the stored data lines 125a, 125b, etc. corresponding to the index field 113 are loaded into a high-speed buffer 140, so as to be available if the data item is currently loaded in the cache 120.
If the addressed data item is currently loaded in one of the ways 120a, 120b, etc. of the cache 120, the corresponding comparator 130a, 130b, etc. asserts a cache-hit signal, thereby identifying the particular way Hit-a, Hit-b, etc. that contains the data line. If a hit is asserted, the appropriate word is retrieved from the corresponding buffer 140, using the word field 113 to select the appropriate word 141a, 141b, etc. from the data line contained in the buffer 140. The retrieved word is forwarded to the processor 180 that provided the address 110. In a conventional embodiment of the cache system 100, the time required to effect the comparison of the tag field 111 to the stored tag fields 121a, 121b, etc., and the subsequent selection of the appropriate word 141a, 141b, etc. when a cache-hit occurs, is substantially less than the delay time corresponding to the slower memory. In this manner, the effective access time to a data item is substantially reduced when the data item is located in the cache 120.
If a cache-hit does not occur, the above described load of the addressed data line from memory into a select way, Way-a 120a, Way-b 120b, etc., of the cache 120 is effected, typically by loading the data line into the least recently used (LRU) way, or other prioritization scheme, as mentioned above.
The time required to store words, effectively from the processor 180 to the memory, is similarly accelerated via use of the cache system 100. The presence of the addressed data item in the cache 120 is determined, using the above described comparison process. If the data item is currently located in the cache 120, the new value of the data item from the processor 180 replaces the select word, or words, of the buffer 140, and the buffer 140 is loaded into the data line 125a, 125b, etc. containing the addressed data item. The xe2x80x9cmodifiedxe2x80x9d field 129 is used to signal that the contents of a cached line have changed. Before a data line is overwritten by a new data line, the modified field 129 is checked, and, if the data line has been modified, the modified data line is stored back into the memory, using the stored tag field 121a, 121b, etc. to identify the location in memory to store the line.
Although an n-way associative cache provides an effective means for increasing the effective memory access speed, the simultaneous way-comparison scheme, wherein the tag of the addressed data item is compared to all of the stored tags, consumes energy at a rate that is n-times higher than a one-way associative cache. It is not uncommon for n-way associative caches to be substantially hotter than other areas of an integrated circuit, or printed circuit boards.
To reduce the power consumption of a conventional n-way associative cache, predictive techniques are applied to select a likely way corresponding to a given address. In a conventional embodiment of a way prediction scheme, the likely way is first checked for the addressed data item, and only if that way does not contain the addressed data item, are the remaining ways checked. xe2x80x9cA HIGH-PERFORMANCE AND LOW-POWER CACHE ARCHITECTURE WITH SPECULATIVE WAY-SELECTIONxe2x80x9d, by Koji Inoue et al, published in IEICE Trans. Electron., Vol. E83-C, No. 2, February 2000, pages 186-194, and incorporated by reference herein, presents a way-prediction scheme, and a comparison of the energy consumption by a way-prediction scheme to non-predictive schemes. If the prediction success rate is high, the energy savings can be quite substantial, because a reduction in energy by a factor of n is achieved each time the way-prediction is correct.
Illustrated in FIG. 1 is an example way prediction table 150 that is used to predict the particular way 120a, 120b, etc. that is associated with an addressed data item. A subset 115 of the data address 110 is used to index the way-prediction table 150, as indicated by the dashed lines in FIG. 1. A variety of schemes may be used to define this subset 115 of the address 110, and to define the algorithm used to provide the contents of the way-prediction table 150. A straightforward embodiment uses the index field 113 as the subset 115 that is used to index the 47 table 150, and the contents of the table 150 correspond to the least recently used (LRU) way, Way-a 120a, Way-b 120b, etc., for each index 113. Alternatively, a subset 115 of the index field 112, or a subset 115 taken from both the tag 111 and the index 112 fields may also be used to provide an index to the way-prediction table 150. The choice of the subset 115 of the address 110 used to index the way-prediction table, and the number, n, of ways, determines the size of the required way-prediction table. In an 8-way associative cache, three bits are required to uniquely identify each of the ways in the way-prediction table, and the number of three-bit entries in the table 150 is determined by the number of unique combinations of the subset 115 of the address used to index the table 150. If ten bits (i=10) are used as the subset 115 to index the table 150, for example, 1024 (2i) three-bit entries must be supported in the table 150. Note that the length of the table 150 (2i) is substantially smaller than the number of addressable lines (2Mxe2x88x92w, where M is the number of bits in the address (typically, 16, 32, or 64), and w is the number of bits, typically 2 to 5, corresponding to the number of words per line, typically 4, 8, 16, or 32). That is, the index to the table 150 is a proper subset 115 of the tag 111 and index 112 fields.
When an address 110 is requested, the predicted way from the way-prediction table 150 is used to selectively access only the predicted way. All addresses have a corresponding predicted way, but, because the index to the table 150 is a subset of the number of addressable lines, different addresses 110 may each point to the same entry in the table 150. Only one of these different addresses may actually correspond to the data that is stored in the way that is indicated by the content of the table 150 at any given time, hence the term xe2x80x9cpredictedxe2x80x9d way.
For convenience, the subscript p is used hereinafter to designate the predicted way. The stored tag 121p corresponding to the index field 112 in the selected way 120p is provided to the comparator 130p of the selected way 120p, and the corresponding data line 125p is provided to the buffer 140p of the selected way 120p. The Hit-p signal is asserted if the predicted way 120p contains the data line, and the addressed word is provided to the requesting processor 180 from the buffer 140p. If the predicted way 120p does not contain the addressed data item, each of the other, non-predicted, ways are checked for the presence of the addressed data item, using the techniques discussed above for checking an n-way associative cache for an addressed data item.
If none of the ways contains the addressed data item, the data line that contains the addressed data item is loaded from the memory into the cache 120, typically into the least recently used way at the index position. Assuming that the way-prediction table 150 is configured to store the most recently used way, an identification of the way that was used to store the data line is stored into the way prediction table 150. In this manner, a subsequent request for a data item in the same data line as the currently addressed data item will produce the correct predicted way, and thereby save power consumption.
Variations on this power-saving scheme may also be used. For example, all of the tags 121a, 121b, etc. may be loaded into the corresponding comparator 130a, 130b, etc., but only the data line 125p of the predicted way 120p may be loaded into the buffer 140p. In this manner, some power savings are achieved by avoiding the loading of all of the data lines 125a, 125b, etc. of the non-predicted ways into the buffers 140a, 140b, etc., while also avoiding the time required to recheck all of the tag fields 121a, 121b, etc. when the predicted way does not contain the addressed data item. If one of the other tag comparators 130 asserts a hit signal, the data line 125 of the corresponding way is loaded into the corresponding buffer 140, and the appropriate word is provided to the processor 180. If none of the other tag comparators 130 assert a hit signal, the addressed data line is loaded from memory into the cache 120, as discussed above.
Note that in a conventional way-prediction scheme as illustrated in FIG. 1, an incorrect way prediction necessitates a reliable means of xe2x80x9ccancelingxe2x80x9d the access to the data line in the mis-predicted way, and other effects of this mis-prediction. Generally, a way-prediction scheme is used in a pipelined architecture, wherein the memory access is initiated prior to the time that the value from the memory is required. The determination of whether the predicted way contains the addressed data item, and the subsequent determination of whether any of the ways contain the addressed data item, is time consuming, and therefore a cache-miss is not immediately determinable. During the time that the cache-hit or cache-miss is being determined, the pipelined process typically effects actions in anticipation of a cache-hit, so as to take advantage of the speed gains provided by a cache-hit. When a cache-miss occurs, some or all of the effects of the cache-hit anticipatory actions must be cancelled. The circuitry and timing constraints required to effect a reliable cancellation of a mis-prediction can be fairly complex. However, the power savings that are achievable by a way-prediction scheme, and the speed gains that are achievable by a high occurrence of cache-hits, generally offset the additional design and production costs associated with the addition of this complex circuitry.
It is an object of this invention to provide a method and system that allows a deterministic way-identification. It is a further object of this invention to provide a method and system that eliminates the need to handle way mis-predictions. It is a further object of this invention to provide a way-determination scheme that is less complex than conventional way-prediction schemes that require mis-prediction cancellation processes. It is a further object of this invention to provide a way-identification scheme that provides an immediate determination of whether an addressed data item is known to be in a way of the cache.
These objects, and others, are achieved by providing a way-determination scheme for an n-way associative cache that is based on the entirety of the line address of the requested data item, thereby eliminating the possibility of a mis-identification of the way that contains the requested data item. A limited number of line addresses, and their assigned ways, are stored in a Way Determination Table. If a requested data address corresponds to one of these line addresses, the assigned way is provided; if not, a xe2x80x98nullxe2x80x99 response is provided, indicating that the way is not known for this data address. If an assigned way is provided, the way corresponding to this assigned way is accessed, with the assurance that the accessed way contains the requested data item. If a xe2x80x98nullxe2x80x99 response is provided, the n-ways are accessed to determine whether the requested data line is in the cache, as in a conventional n-way associative cache. A xe2x80x98predictedxe2x80x99 way is not provided, thereby eliminating the need to accommodate the possibility of a mis-prediction or mis-speculation. A content-based access is used to determine whether a requested data address has an assigned way, thereby providing for a rapid determination of whether a selective way access can be used, or an n-way access is required.