1. Field of the Invention
This invention relates to the field of microprocessors and, more particularly, to data caches within microprocessors.
2. Description of the Related Art
Superscalar microprocessors achieve high performance by executing multiple instructions per clock cycle and by choosing the shortest possible clock cycle consistent with the design. Superpipelined microprocessor designs, on the other hand, divide instruction execution into a large number of subtasks which can be performed quickly, and assign pipeline stages to each subtask. An extremely short clock cycle is the goal of superpipelined designs. By overlapping the execution of many instructions within the pipeline, superpipelined microprocessors attempt to achieve high performance. Many microprocessor designs employ a combination of superscalar and superpipeline techniques to achieve performance goals.
As used herein, the term "clock cycle" refers to an interval of time accorded to various stages of a pipeline within the microprocessor. Storage devices (e.g. registers and arrays) capture their values according to the clock cycle. For example, a storage device may capture a value according to a rising or falling edge of a clock signal defining the clock cycle. The storage device then stores the value until the subsequent rising or falling edge of the clock signal, respectively. Generally, a pipeline comprises a plurality of pipeline stages. Each pipeline stage is configured to perform an operation assigned to that stage upon a value while other pipeline stages independently operate upon other values. When a value exits the pipeline, the function employed as the sum of the operations of each pipeline stage is complete. For example, an "instruction processing pipeline" is a pipeline employed to process instructions in a pipelined fashion. Although the instruction processing pipeline may be divided into any number of stages at which portions of instruction processing are performed, instruction processing generally comprises fetching the instruction, decoding the instruction, executing the instruction, and storing the execution results in the destination identified by the instruction.
A short clock cycle (i.e. a high frequency of operation) is a goal of microprocessor designs employing superscalar, superpipelined, or both superscalar and superpipelined techniques. A first microprocessor having a higher frequency than a second microprocessor is more likely, when employed in a given computer system, to achieve high performance. High performance computer systems may be more desirable than low performance computer systems in many situations.
The clock cycle time achieved by a microprocessor is determined in large part by the pipeline stage which exhibits the longest "path" (e.g. number of logic levels between the storage devices which delimit the pipeline stage). The longest path is often referred to as the "critical path". The amount of time which expires between application of a signal at the input to the critical path and a corresponding output appearing at the output of the critical path limits the clock cycle at which the microprocessor can operate.
A path which is often one of the critical paths in a microprocessor is the cache access path. Microprocessors often employ caches to reduce the amount of external memory bandwidth needed to support high performance of the microprocessor. The cache provides rapid access to the microprocessor for a subset of the data stored in main memory. The subset of data is typically the data most recently accessed by the microprocessor. Generally, a cache stores data in units of cache lines. If a datum requested by the processor is not stored within the cache (a "cache miss"), then the cache line including the requested datum is transferred from memory into the cache. A previously stored cache line may be discarded in order to allocate storage for the missing cache line. Generally, cache lines are aligned to a boundary defined by the number of bytes within the cache line. For example, a 32 byte cache line is aligned to a 32 byte boundary within the memory space. The first cache line within the memory space includes the bytes between address zero and address 31 (in decimal). In other words, addresses 31 and 32 form a thirty-two byte boundary. While a requested datum being absent from the cache at the time of the request is referred to as a cache miss, finding a requested datum stored within the cache at the time of the request is referred to as a "cache hit."
Typically, the cache access path includes logic for calculating the address to be accessed and a multiplexor for selecting between the calculated address and other address sources. Particularly, the multiplexor may select a fill address corresponding to a previous cache miss when the cache line is returned from memory. While the cache access path may occur over several clock cycles, it is important for performance to reduce the cache access path to a few clock cycles as possible while still maintaining a short clock cycle. It is therefore desirable to reduce the cache access path such that it is not a critical path without increasing the number of clock cycles over which the cache access path operates.