1. Field of the Invention
This invention relates to reducing power consumption, and more particularly to a method and apparatus of reducing power consumption of caches without performance degradation.
2. Description of the Related Art
In many of today""s processing systems, such as notebook computer systems, it is important to reduce the power consumption and energy use. In processing systems, cache memory schemes are used to optimize performance. High performance caches, however, tend to increase power consumption.
The benefits of a cache are maximized whenever the number of access requests to cached memory addresses, known as xe2x80x9ccache hits,xe2x80x9d are maximized relative to the number of access requests to non-cached memory addresses, known as xe2x80x9ccache misses.xe2x80x9d Despite the added overhead that typically occurs as a result of a cache miss, as long as the percentage of cache hits is high, the overall access rate for the system is increased.
In most computer memory systems, memory hierarchy plays a major role in determining the actual system performance. The high speed memory close to the processor is referred to as level one, or L1, cache, and a cheaper, denser, slower memory is referred to as level two, or L2, cache. This hierarchy may continue for numerous levels. The lowest level memory, level N or LN, is typically main memory, such as random access memory (RAM) or dynamic RAM (DRAM). Distance from the processor refers to the number of processor cycles it takes to get data to the processor from that level of the memory hierarchy. Thus, in a memory hierarchy, the closer to the processor the data resides, the higher the performance.
When data is not found in the higher level of the memory hierarchy and a miss occurs, the data must be accessed from a lower level of memory hierarchy. Since each level of the memory hierarchy contains increased amounts of storage, the probability increases that the data will be found. But equally important for performance is the latency or number of cycles it takes to transfer the first byte of data to the processor plus the time to transfer the remaining bytes of the cache line.
A cache consists of S sets and W ways. Each set contains several cache lines, i.e. W is equal to one or more. Each cache line contains control information and data information. The control information consists of tags, which typically contain an address and coherency bits. The data information consists of a data array. Additionally, each set has control bits that may implement a replacement algorithm, such as least recently used (LRU) or pseudo LRU (PLRU).
A tag is a set of bits attached to a block (a block is the smallest unit that may be copied to or from memory) that define several characteristics, and in particular, the address it is currently mapped to. An example of a format for a data array is illustrated in FIG. 1. In the example of FIG. 1, data array 100 comprises 10 check bits and 128 data bits. An example of a format for a tag array is illustrated in FIG. 2. In the example of FIG. 2, tag array 200 comprises 7 check bits and 26 tag bits.
Caches may have different degrees of associativity, and are often referred to as being N-way set associative. In a one-way set associative cache, each memory address is mapped to one cache line. This type of cache, however, is typically prone to xe2x80x9chotxe2x80x9d locations where multiple memory addresses from different cache pages that are accessed relatively frequently are mapped to the same entry in the cache, resulting in frequent cache misses and lower performance. Multi-way set associative caches, such as four-way set associative caches, provide multiple cache lines to which a particular memory address may be mapped.
There are existing cache schemes that sequentially (also known as serial schemes) or concurrently (also known as parallel schemes) compare the tag for a given memory address with the tag for each entry in the set. A serial cache scheme accesses the tag array, performs a tag match, then accesses the data array for the specified cache line only. Accessing the data array of only one cache way lowers the total power consumed by the data cache memory array, since not all data arrays in a set are activated for every cache memory access. Since this decision takes time to make, it impacts the access time requirements, therefore impacting the performance of the cache. A parallel cache accessing scheme is used to enhance performance of processors, but tends to increase power consumption by activating all data arrays in parallel speculatively.