The present invention relates to a system and method for reducing time and power consumption required for reading data from a cache. Specifically, a comparison method using both mini-tags and full tags is provided for reading data from the cache, and a victim selection method is provided for writing data into the cache, in order to reduce the time and power consumption required for reading data from the cache.
In a computer system, a cache stores data in order to decrease data retrieval times for a processor. More particularly, a cache stores specific subsets of data in high-speed memory. When a processor requests a piece of data, the system checks the cache first to see if the data is stored within the cache. If it is available, the processor can retrieve the data much faster than if the data was stored in other computer readable media such as random access memory, a hard drive, CD ROM, or floppy disk.
One particular type of cache is referred to as a trace cache. A trace cache is responsible for building, caching, and delivering instruction traces to a processor. In one type of trace cache, instructions are stored as blocks of decoded micro-operations (xe2x80x9cmicro-opsxe2x80x9d). These blocks are known as xe2x80x9ctracesxe2x80x9d and are, in this example, the only units of instructions allowed to be accessed by a processor. Traces are distinguishable from one another only by identifying information found at the beginning of each trace known as a xe2x80x9ctrace head.xe2x80x9d Generally, traces may be terminated upon encountering an indirect branch or by reaching one of many preset threshold conditions, such as the number of conditional branches in the trace or the number of total micro-ops in the trace.
A cache may be organized into rows and columns. For example, a known type of trace cache has 256 rows and 16 columns. The 16 columns alternate between those containing data and those containing xe2x80x9ctagsxe2x80x9d identifying the data. Each tag may be, for example, 24 bits. In a prior trace cache addressing scheme, a processor uses a requested address to locate data in the trace cache. In this prior system, bits 3-10 of a 32 bit requested address are referred to as a xe2x80x9cset address,xe2x80x9d which is used to select one of the 256 rows of the trace cache. The remaining bits in this address serve as a requested tag for identifying the data entry sought by the processor.
A flow diagram for the prior technique is shown in FIG. 1. During a read operation using this prior technique, once a particular row in the cache has been selected (step 11), all of the tags in that row are read (step 12), and a tag comparison operation is performed (step 13) to determine whether there is a matching tag in the selected row. If it is determined that none of the tags matches (xe2x80x9chitsxe2x80x9d) the tag of the requested address in step 14, a write operation is performed in step 15. This write operation may include, for example, a pseudo-least recently used (xe2x80x9cpseudo-LRUxe2x80x9d) victim selection, as known in the art. If a tag hits in step 14, the data identified by the tag is read from the trace cache (step 16), is validated (step 17), and the process ends in step 18. Since this process requires that all the tags from the selected row be read from the trace cache and compared before any data is read, it is slow, and a faster process may be more desirable.
A flow diagram for a second prior technique is shown in FIG. 2. During a read operation using this second prior technique, once a particular row in the cache has been selected (step 20), all of the tags in that row, along with their associated data entries, are read in parallel (steps 21 and 22), and a tag comparison operation is performed (step 23). A tag comparison operation is performed (step 23) to determine whether there is a matching tag in the selected row. If it is determined that none of the tags xe2x80x9chitsxe2x80x9d the tag of the requested address in step 24, a write operation, including, e.g., a pseudo-LRU victim selection is performed in step 25. If a tag does hit in step 24, the data identified by the requested address and matching tag is multiplexed out of all the read entries from the selected row using, in this case, an 8-to-1 multiplexer. The data is validated in step 27 and the process terminates in step 28.
In this second prior technique, the reason that all of the data entries for an accessed row must be read from the cache is that the technique does not allow the processor to recognize beforehand which of the data entries is the desired one, so that a read of all of the data entries for that row becomes necessary. Two significant problems afflict this trace cache reading technique, both of which are due, in part, to the large size of each data entry. First, although this technique may be faster than the first technique (since the data is read from the trace cache in parallel to the reading of the tags, as opposed to the first technique where the data is not read until after a tag hit is found) it is still time consuming. Because each data entry in this example is 300 bits long, reading out every data entry for an accessed row (an operation which yields a total of 2,400 bits in this case) is very time consuming. As a corollary to this effect, the excessive amount of time reading out these data entry bits, and the storage (e.g., in a latch) of this high quantity of data, consumes power that could otherwise be applied to other important tasks.