The present invention relates generally to processors and microprocessors, and more specifically to instruction level operations of pipelined or superpipelined microprocessors.
Pipelined and superpipelined computer architectures provide broad advantages of increases in instruction throughput by overlapping the execution of multiple instructions, utilizing more of the processor at one time than traditional non-pipelined architectures. Where non-pipelined architectures execute all instructions serially, with the beginning of the next instruction following the completion of the previous instruction, pipelined architectures process instructions with varying degrees of parallel type processing. Pipelined architectures are well known in the art, and the benefits, advantages, and general structure of such architectures will not be described further herein.
Today""s processors, such as the Pentium(copyright) line of processors available from Intel Corp., use a pipelined or superpipelined architecture. As the size of components used in computer processors has decreased, and the speed of computing processors has increased, architecture level operations and structures of processors have become more and more important.
Pipelined architectures are subject to certain hazards due to the dynamic nature of pipeline structures and techniques. These hazards include structural hazards arising when the processor hardware cannot support all the combinations of instructions in the pipeline structure, data hazards arising when an instruction depends on the results of a yet to be executed instruction, and control hazards arising from branches and other instructions that change the program counter.
In pipelined and superpipelined architectures, a large and costly potential control hazard is a branch hazard or branch penalty. A branch is a machine instruction that, when taken, switches the central processing unit (CPU) to another location in the program (in memory). The instruction following a non-branch instruction or a non-taken branch will be the next sequential instruction down the pipeline. If the instruction is a taken branch, however, the target of the branch may not be the next sequential instruction. When an instruction constitutes a branch, the target of that branch must be ascertained to effectively continue with execution of the instruction.
Branch prediction schemes, or branch target predictors, may be used to help ascertain the target of a branch. A mispredicted branch may cause a penalty of multiple clock cycles due to flushing the pipeline, refetching, and restarting an instruction. Superpipelined architectures can have branch penalties of six or seven clock cycles, which if encountered often can effectively eliminate the benefits of pipelining. The deeper the pipeline, the worse the potential branch penalty. In CPU instruction execution, branch prediction entails predicting the outcome of a branch so that the target instructions may be executed in parallel with the current instructions. The target of a branch instruction in a typical architecture is not known at the time the branch is encountered. The target of the branch is only decoded later down the pipeline.
In order to more effectively predict whether a branch will be taken or not taken, numerous schemes for prediction have been implemented. Some such schemes for resolving a branch instruction include predicting that the branch is always taken or always not taken. These types of schemes are known as static prediction schemes or static predictors. The rigid nature of static prediction schemes does not account for changing conditions in the execution environment. For example, a prediction that a branch is not taken leads to fetching and decoding of instructions sequentially following the branch instruction. If the branch is not taken, this scheme works well with no delay. If, however, the branch is taken, then the instructions being fetched and decoded must be discarded. The determination of whether a branch is taken may not be made for several stages down the pipeline.
Another type of branch prediction scheme is a dynamic predictive model which may be implemented using a branch target buffer (BTB), a branch prediction buffer, a branch history table, or the like. A BTB keeps a history for the execution of branches, and uses the history to predict the behavior of the branch with high accuracy. The BTB sends prediction information to the fetch module to process. Whenever the BTB finds a branch predicted to be taken, it directs the fetch unit to fetch the target of the branch. If the branch target prediction is correct, the potential branch penalty is eliminated. Further detail about prediction configurations for a BTB will not be discussed herein.
In decoding instructions in a processor, delay is to be avoided when at all possible. Any delay in the interpretation or execution of an instruction can lead to later delays in processor speed, efficiency, and operation. Branch penalties pose a significant potential time savings opportunity in a processor. An increase in branch prediction accuracy from 95% to 97% could lead to an overall speed increase in the processor of 20%.
The align stage of a processor determines where in a fetched set of data the instruction to be executed begins and ends. A pointer in the align stage is adjusted to point to the next instruction to be executed in the processor. An instruction in a computer architecture scheme is traditionally decoded further down the pipeline than in the align stage. When the instruction is a branch or potential branch, it is important to know as soon as possible the details concerning the branch. Specifically, the target of the branch is very important. The sooner the target of the branch is known, the sooner the target can be fetched.
For one embodiment, a method for reducing the branch penalty in a microprocessor includes detecting whether an instruction is a branch, and predecoding the detected branch instruction to determine if the branch is predicted to be taken.
For one embodiment, the target of the branch is delivered to the microprocessor length stage if the branch is predicted taken by the branch target predictor. If the branch is predicted not to be taken, instruction flow continues sequentially with the next instruction.
For another embodiment, an apparatus for reducing branch penalty in a microprocessor includes a branch predecode module for determining whether a branch is predicted taken, and an align stage in data communication relation with the branch predecode module and receiving information indicating whether the branch is predicted taken therefrom.