Computing systems, including micro-processor based systems, use a cache in conjunction with a main memory to hold data and/or instructions which are being processed. The cache comprises a memory where the temporary contents needed for processing are maintained so that the most recently used data from a main memory is located in a cache memory for rapid access by the micro-processor system.
Cache memories are organized as set associative comprising sets of individual SRAMS which contain the desired data and which typically have common address lines. Each SRAM is referred to as a way, and in a two-way associative cache, common address lines are connected to each SRAM. Lines of multi-byte data are stored in each location of the ways. The information contained within a line of the set associative cache memory is derived by an effective address 20 generated by the microprocessor system. The effective address includes a tag field, a line index field and a byte field. The tag field of an effective address is utilized to determine whether or not one or the other ways contains the data being sought.
Both ways may be simultaneously addressed, and data from one or the other of the ways may be selected by a multiplexer by comparing a tag derived from the address applied to the ways of the associative cache to a tag contained in a tag memory or directory. The tag memory includes a row of tag data corresponding to the same row number of data in a given way. Thus, a comparison between the contents of a row of a tag memory and a tag from the tag memory determines which way contains the desired data and a multiplexer selects the desired data from the identified way.
In small computing systems, power efficiency becomes more important than was previously the case in earlier applications of set associative cache memories. Associative cache memories provide for higher speed data access when both ways are simultaneously addressed and clocked, and a late select command to the multiplexer selects the data from one of the ways. While this provides for optimum access speed, power is dissipated in each of the SRAMs of the associative cache when only one SRAM contains the selected data. This represents a significant waste of operational power, particularly in battery operated devices such as cellular telephones which may use such microprocessor systems.
To avoid the needless consumption of power by the way which does not contain the desired data, some set associative cache memories have been provided with prediction logic. These systems all provide for a prediction of which way contains the requested data, and enable only the predicted way to produce the data. However, the prediction logic consumes power, and does not guarantee 100% predictability. Accordingly, more cache misses occur on a false prediction with only a marginal savings in power consumption.
In order to reduce power consumption, some designs reduce voltage levels or the operating frequency of the access cycle. There are limitations, however, to these techniques particularly lowering the operating frequency, since providing adequate time to make a set decision, and then obtain the required data, mandates a reduced maximum frequency of operation.
In a paper entitled, “A 600 MHz Single Chip Multiprocessor With 4.8 GB/s Internal Shared Pipelined Bus and 512 kB Internal Memory”, 2003 International Solid-State Circuits Conference, pg. 254 , a set associative instruction cache is described having reduced power consumption for normal prefetch cycles. Tag memory access and data memory access are divided into two consecutive cycles and only one way is activated. On the other hand during branch conditions, tag memory access and data memory access of both ways are executed in the same cycle to enhance the performance. In this way, there are two variations of cache performance, one emphasizing low power and the other high performance. However, the trade off between power savings and obtaining higher access speed is limited to normal prefetch and branch conditions. Further, the access during normal prefetch operations is made over two cycles which significantly slows down the access process. Accordingly, it would be desirable to have a system which provides high performance, as well as lower power consumption, in a variety of applications. The present invention addresses such a need.