Microprocessors typically employ instruction caches to speed up retrieval and execution of instructions. The instruction cache typically acts as a buffer memory between a higher level of memory and a processor. When an instruction is fetched by the processor, the instruction is copied into the instruction cache for direct access by the processor. If the same instructions are used frequently in a set of program instructions, storage of these instructions in the instruction cache yields an increase in throughput because slower accesses of higher level memory are reduced.
For example, a set-associative instruction cache may include a data array and a tag array. The data array and the tag array may be combined to form cachelines or words that are organized in different ways within the instruction cache. When an address for an instruction fetch is generated, the instruction cache compares a tag field of the address to tag values currently stored in a corresponding cacheline set of the tag array. If a tag match exists, and the tag is valid (i.e., a cache hit), then that data is fetched from the location in the data array corresponding to the desired address. Since the data is retrieved directly from the instruction cache, speed is improved as a result of not having to access external memory.
In one example implementation of an instruction cache, when an address for an instruction fetch is generated, portions of the tag array and the data array that correspond to a set of cachelines that potentially match the address are activated in parallel. In other words, all ways in the tag array that correspond to the set of cachelines are activated and all addresses in the data array that correspond to the set of cachelines are activated. When the tag array and the data array are activated, the data array location that holds the data that corresponds to the instruction address is fetched and the data in the other activated locations is discarded. By activating all locations of both the tag array and the data array that correspond to the set of cachelines in parallel, an instruction fetch typically may be performed more quickly relative to a serial approach where all tag locations are activated to identify a matching data location in a clock cycle, and then a single location in the data array is activated to fetch the data in a subsequent clock cycle.
However, there are various limitations with these parallel and serial approaches for performing an instruction fetch. For example, in the parallel approach, since all data locations in the data array that correspond to the set of cachelines are activated to fetch data from only one location, power consumption is increased in favor of increasing processing speed. In other words, the parallel approach is quicker, but it is also more power hungry. On the other hand, in the serial approach, power consumption is reduced by only activating one location in the data array. However, the tag array and data array accesses must occur serially over multiple clock cycles in order to identify which location in the data array must be activated to fetch the desired data. In other words, the serial approach sacrifices processing speed in favor of reducing power consumption.