1. Field of the Invention
The present invention relates to the field of microprocessors and, more particularly, to caching of instructions within microprocessors.
2. Description of the Related Art
Superscalar microprocessors achieve high performance by executing multiple instructions per clock cycle and by choosing the shortest possible clock cycle consistent with the design. As used herein, the term "clock cycle" refers to an interval of time accorded to various stages of an instruction processing pipeline within the microprocessor. Storage devices (e.g. registers and arrays) capture their values according to the clock cycle. For example, a storage device may capture a value according to a rising or falling edge of a clock signal defining the clock cycle. The storage device then stores the value until the subsequent rising or falling edge of the clock signal, respectively. The term "instruction processing pipeline" is used herein to refer to the logic circuits employed to process instructions in a pipelined fashion. Although the pipeline may be divided into any number of stages at which portions of instruction processing are performed, instruction processing generally comprises fetching the instruction, decoding the instruction, executing the instruction, and storing the execution results in the destination identified by the instruction.
An important feature of a superscalar microprocessor (and a superpipelined microprocessor as well) is its branch prediction mechanism. The branch prediction mechanism indicates a predicted direction (taken or not-taken) for a branch instruction, allowing subsequent instruction fetch to continue with the predicted instruction stream indicated by the branch prediction. The predicted instruction stream includes instructions immediately subsequent to the branch instruction in memory if the branch instruction is predicted not-taken, or the instructions at the target address of the branch instruction if the branch instruction is predicted taken. Instructions from the predicted instruction stream may be speculatively executed prior to execution of the branch instruction, and in any case are placed into the instruction processing pipeline prior to execution of the branch instruction. If the predicted instruction stream is correct, then the number of instructions executed per clock cycle is advantageously increased. However, if the predicted instruction stream is incorrect (i.e., one or more branch instructions are predicted incorrectly), then the instructions from the incorrectly predicted instruction stream are discarded from the instruction processing pipeline and the number of instructions executed per clock cycle is decreased.
When branch misprediction occurs, the desired instruction stream is typically fetched from the instruction cache and conveyed through the pipeline of the microprocessor. The number of clock cycles that it takes the newly fetched instructions to propagate to the pipeline stage where the misprediction was originally detected is known as the branch misprediction penalty. The branch misprediction penalty increases when the desired instruction stream is not located in the instruction cache.
Typically, instruction fetching occurs early in the pipeline and branch misprediction is detected toward the end of the pipeline (upon instruction execution). Thus, the branch misprediction penalty tends to increase with the number of pipeline stages. The relative impact of branch misprediction on performance generally increases as well.
In microprocessors executing fixed-length instruction sets, instructions begin at regular intervals within an instruction cache line. This greatly simplifies the logic necessary to route instructions from a fetched cache line to decode and functional units. Instructions fetched as a result of branch misprediction then have fewer pipeline stages to traverse in order to reach the execute stage of the pipeline (and thus recover from the effects of the incorrectly predicted branch).
Microprocessors executing a variable-length instruction set (e.g., the x86 instruction set), however, may exhibit high branch misprediction penalties due to the increased complexity of the pipeline between the instruction fetch and execute stages. Unlike fixed-length instructions, variable-length instructions appear at irregular intervals within an instruction cache line. Accordingly, additional logic is employed for determination of instruction length and alignment of instructions for dispatch to one or more decode units. This translates to more pipeline stages between instruction fetch and execute, and thus to a higher branch misprediction penalty.