1. Field of the Invention
The present invention generally relates to microprocessors, and particularly relates to managing instruction flushing in a microprocessor's instruction pipeline.
2. Relevant Background
Microprocessors find use in a wide variety of products, ranging from high-end computational systems, where processing power represents a paramount design consideration, to low-end embedded systems, where cost, size, and power consumption comprise the primary design considerations. Processors targeted for battery-powered portable devices, such as music players, palmtop computers, Portable Digital Assistants (PDAs), and the like, represent a particularly complex mix of competing design considerations. On the one hand, processor performance must be sufficient to support the device's intended functionality and provide a satisfactory user “experience.” On the other hand, low processor power consumption helps to permit the use of reasonably sized battery systems, while achieving acceptable battery life.
The above mix of design tradeoffs has resulted in numerous processor performance and efficiency advancements. For example, modem pipelined processors, such as those based on a Reduced Instruction Set Computer (RISC) architecture, oftentimes employ branch prediction methods to prevent instruction pipeline “stalls.” With an instruction pipeline, different aspects of sequential instruction processing generally occur in different stages of the pipeline. For example, a given instruction pipeline may include successively arranged fetch, decode, issue, and execute stages. Each stage generally operates on a different instruction, or instructions, at each instruction clock cycle. For example, as the execution of one instruction is being completed in the execute stage, other instructions are being fetched, decoded, issued, etc. Staged execution allows the pipelined processor on average to execute one instruction per clock cycle.
However, maintaining that one-instruction-per-clock cycle average depends on keeping the pipeline full of instructions. In turn, keeping the pipeline full of instructions means that the pipelined processor generally cannot afford to stop program instruction fetching while determining whether a given program branch will or will not be taken. That is, the processor generally must make a guess (a prediction) about whether a given program branch will be taken or not taken. If the prediction is “taken,” then instruction fetching continues from the branch target address. If the prediction is not taken, then instruction fetching continues from the next instruction address after the branch instruction.
In either case, the instructions fetched into the pipeline subsequent to such a prediction will be the “wrong” instructions if that prediction was incorrect. The pipeline may have multiple predictions outstanding at any given time, i.e., it may have multiple undetermined branch instructions in-flight within various ones of its pipeline stages. Thus, any given one of the instructions in-flight within the pipeline may depend on one or more of the outstanding branch predictions, or may not depend on any of them.
Such possibilities introduce a processing complexity in the context of branch mispredictions. Generally, at least some of the in-flight instructions will be dependent on at least one of the outstanding branch predictions, and therefore should be flushed from the instruction pipeline responsive to detecting a corresponding branch misprediction. The challenge arises from the difficulty in accurately identifying or tracking the branch prediction dependencies of the in-flight instructions, particularly because some instructions may be executed out of the original program order.
For example, a given instruction may have to wait on data because of a cache miss and, rather than stalling the pipeline while the data is retrieved from external memory, execution of that instruction may be suspended while the pipeline continues processing other in-flight instructions. More generally, executing instructions out of program order represents one of the processing performance advantages of superscalar instruction pipelines comprising parallel sets of pipeline stages. Such superscalar pipelines may have large numbers of in-flight instructions, with many of them executing out of program order.
Thus, selectively flushing only the instructions dependent on a particular branch misprediction represents a potentially significant challenge in terms of being able to accurately identify such dependencies without introducing too much tracking complexity. Of course, the alternative to selectively flushing instructions is flushing all instructions from the pipeline when a branch misprediction is detected, without regard to whether individual ones of those instructions actually depend on the mispredicted branch instruction. The downside of that approach is the performance and efficiency loss associated with flushing valid instructions from the pipeline that have already been fetched and at least partially processed.