The present invention relates to computers and, more particularly, to a method for managing a set-associative cache. A major objective of the present invention is to reduce the average power consumed during single-cycle read operations in a set-associative cache that employs parallel reads.
Much of modern progress is associated with the increasing prevalence of computers. In a conventional computer architecture, a data processor manipulates data in accordance with program instructions. The data and instructions are read from, written to, and stored in the computer's "main" memory. Typically, main memory is in the form of random-access memory (RAM) modules.
A processor accesses main memory by asserting an address associated with a memory location. For example, a 32-bit address can select any one of up to 2.sup.32 address locations. In this example, each location holds eight bits, i.e., one "byte" of data, arranged in "words" of four bytes each, arranged in "lines" of four words each. In all, there are 2.sup.30 word locations, and 2.sup.28 line locations.
Accessing main memory tends to be much faster than accessing disk and tape-based memories; nonetheless, even main memory accesses can leave a processor idling while it waits for a request to be fulfilled. To minimize such latencies, a cache can intercept processor requests to main memory and attempt to fulfill them faster than main memory can.
To fulfill processor requests to main memory, caches must contain copies of data stored in main memory. In part to optimize access times, a cache is typically much less capacious than main memory. Accordingly, it can represent only a small fraction of main-memory contents at any given time. To optimize the performance gain achievable by a cache, this small fraction must be selected strategically.
In the event of a cache "miss", i.e., when a request cannot be fulfilled by a cache, the cache fetches an entire line of main memory including the memory location requested by the processor. Addresses near a requested address are more likely than average to be requested in the near future. By fetching and storing an entire line, the cache acquires not only the contents of the requested main-memory location, but also the contents of the main-memory locations that are relatively likely to be requested in the near future.
Where the fetched line is stored within the cache depends on the cache type. A fully-associative cache can store the fetched line in any cache storage location. Typically, any location not containing valid data is given priority as a target storage location for a fetched line. If all cache locations have valid data, the location with the data least likely to be requested in the near term can be selected as the target storage location. For example, the fetched line might be stored in the location with the least recently used data.
The fully-associative cache stores not only the data in the line, but also stores the line-address (the most-significant 28 bits) of the address as a "tag" in association with the line of data. The next time the processor asserts a main-memory address, the cache compares that address with all the tags stored in the cache. If a match is found, the requested data is provided to the processor from the cache.
In a fully-associative cache, every cache-memory location must be checked for a tag match. Such an exhaustive match checking process can be time-consuming, making it hard to achieve the access speed gains desired of a cache. Another problem with a fully-associative cache is that the tags consume a relatively large percentage of cache capacity, which is limited to ensure high-speed accesses.
In a direct-mapped cache, each cache storage location is given an index which, for example, might correspond to the least-significant line-address bits. For example, in the 32-bit address example, a six-bit index might correspond to address bits 23-28. A restriction is imposed that a line fetched from main memory can only be stored at the cache location with an index that matches bits 23-28 of the requested address. Since those six bits are known, only the first 22 bits are needed as a tag. Thus, less cache capacity is devoted to tags. Also, when the processor asserts an address, only one cache location (the one with an index matching the corresponding bits of the address asserted by the processor) needs to be examined to determine whether or not the request can be fulfilled from the cache.
In a direct-mapped cache, a line fetched in response to a cache miss must be stored at the one location having an index matching the index portion of the read address. Previously written data at that location is overwritten. If the overwritten data is subsequently requested, it must be fetched from main memory. Thus, a directed-mapped cache can force the overwritting of data that may be likely to be requested in the near future. The lack of flexibility in choosing the data to be overwritten limits the effectiveness of a direct-mapped cache.
A set-associative cache has memory divided into two or more direct-mapped sets. Each index is associated with one memory location in each set. Thus, in a four-way set associative cache, there are four cache locations with the same index, and thus, four choices of locations to overwrite when a line is stored in the cache. This allows more optimal replacement strategies than are available for direct-mapped caches. Still, the number of locations that must be checked, e.g., one per set, to determine whether a requested location is represented in the cache is quite limited, and the number of bits that need to be compared is reduced by the length of the index. Thus, set-associative caches combine some of the replacement strategy flexibility of a fully-associative cache with much of the speed advantage of a direct-mapped cache.
The index portion of an asserted address identifies one cache-line location within each cache set. The tag portion of the asserted address can be compared with the tags at the identified cache-line locations to determine whether there is a hit (i.e., tag match) and, if so, in what set the hit occurs. If there is a hit, the least-significant address bits are checked for the requested location within the line; the data at that location is then provided to the processor to fulfill the read request.
A read operation can be hastened by starting the data access before a tag match is determined. While checking the relevant tags for a match, the appropriately indexed data locations within each set are accessed in parallel. By the time a match is determined, data from all four sets are ready for transmission. The match is used, e.g., as the control input to a multiplexer, to select the data actually transmitted. If there is no match, none of the data is transmitted.
The parallel read operation is much faster since the data is accessed at the same time as the match operation is conducted rather than after. For example, a parallel "tag-and-data" read operation might consume only one memory cycle, while a serial "tag-then-data" read operation might require two cycles. Alternatively, if the serial read operation consumes only one cycle, the parallel read operation permits a shorter cycle, allowing for more processor operations per unit of time.
The gains of the parallel tag-and-data reads are not without some cost. The data accesses to the sets that do not provide the requested data consume additional power that can tax power sources and dissipate extra heat. The heat can fatigue, impair, and damage the incorporating integrated circuit and proximal components. Accordingly, larger batteries or power supplies and more substantial heat removal provisions may be required. What is needed is a cache-management method that achieves the speed advantages of parallel reads but with reduced power consumption.