1. Field of the Invention
The present invention generally relates to a pipelined, superscalar microprocessor. More particularly, the invention relates to branch prediction in a pipelined microprocessor. Still more particularly, the invention relates to combining static and dynamic branch prediction techniques.
2. Background of the Invention
A microprocessor comprises the logic, typically a semiconductor device, which executes software. Microprocessors thus fetch software instructions from memory and executes them. Each instruction generally undergoes several stages of processing. For example, the instruction must be fetched and decoded to determine the type of instruction (add, multiply, memory write, etc.). Then, the instruction is scheduled, executed and finally retired. Each stage of processing may take multiple clock cycles. It has been recognized that the next instruction to be executed by a processor can be fetched and entered into the processor's pipeline before the previous instruction is retired. For example, while one instruction is being scheduled, the next instruction can be fetched and decoded. Moreover, as the pipeline increases in length, the processor can have more instructions at various stages of processing.
The instructions that a computer programmer writes to implement a particular software program includes a variety of different types of instructions. One type of instruction is generically referred to as a “conditional branch” instruction. This instruction includes a condition that is checked and can either be true or false. For example, the condition might be to check whether a certain error condition exists. The error condition either exists or not. If the error condition currently exists, the condition is true, otherwise the condition is false (i.e., the condition does not exist). Consequently, one set of instructions is executed if the condition is true, and another set of instructions is executed if the condition is false.
Each instruction is stored at a unique address in memory. Typically, if a conditional branch instruction checks a condition that turns out to be false, then program execution follows to the next instruction following the conditional branch instruction. If the condition is true, however, program execution generally jumps to a different instruction and the processor continues executing from that instruction. Thus, the branch is either “taken” or “not taken” depending on whether the condition is true or not. If the condition is true, the branch is taken and the processor's instruction pointer is reloaded with a different address from the branch instruction to continue execution. If the condition is false, the branch is not taken and the instruction pointer is simply incremented so that the processor continues execution with the instruction immediately following the conditional branch instruction.
In a pipelined architecture, instructions may be fetched to enter the pipeline before a previously fetched conditional branch instruction is actually executed. Accordingly, pipelined processors include branch prediction logic that predicts the outcome of branch instructions before the branch instructions are actually executed. The branch predictor logic thus predicts whether the branch is likely to be taken or not, and thus which instructions are to be fetched following the fetching of a conditional branch instruction. The branch predictor merely predicts the future outcome of the conditional branch instruction; the true outcome will not be accurately known until the branch instruction is actually executed. If the branch predictor turns out to have made the correct prediction, then instructions that must be executed are already in the pipeline. If the prediction turns out to have been inaccurate, then the incorrect instructions that had been fetched must be thrown out and the correct instructions fetched. Performance suffers on mispredictions and increases on correct predictions. Choosing a branch prediction scheme that results in correct predictions much more often than mispredictions will result in the performance increase gained from correct predictions outweighing the performance hit on mispredictions.
Many processors use “dynamic” branch prediction techniques which means that the predictions is made in real-time by the processor's branch predictor. Most dynamic branch predictors predict the future behavior of branches using their past behavior (i.e., whether the branches had previously been actually taken or not). Simple branch prediction schemes use either the past behavior of the branch being predicted or the behavior of neighboring branches or combination of the two techniques.
Most simple branch predictors include a table of counters. The table typically includes multiple entries and each entry includes a prediction as to whether a conditional branch instruction will be taken or not. Once a conditional branch instruction is fetched, that instruction is used to point to (“index”) one of the entries in the table. Various branch prediction schemes differ in the way this table is indexed. On encountering a conditional branch instruction in program flow, the table of counters is indexed for the given branch. The most significant bit of the counter at the indexed entry is used as the prediction for the branch. The counter is updated (“trained”) once the outcome of the branch is known. Multi-level branch predictors have multiple tables where the final prediction is determined after a series of lookups with each lookup using the outcome of the previous lookup as the index. Hybrid branch predictors combine two or more simple branch predictors. A “meta-predictor” or “chooser” is used to select among the predictions from the component predictors. The training of a hybrid predictor may involve updating all of the component predictors or only a subset of the component predictors. Further, the training may depend on whether the prediction was correct or incorrect.
Depending on the indexing scheme and the size of the table of counters in a simple branch predictor, multiple branches in a program may share the same entry in the table of counters. This phenomenon is commonly known as “aliasing” and various branches are said to “collide” with one another. If two colliding branches behave the same way, the collision may in fact be “constructive” as the two branches drive the shared counter value in the same direction resulting in correct predictions for both colliding branches. On the other hand, if the two colliding branches behave differently, they will try to push the shared counter in different directions causing an increased number of mispredictions. Unfortunately, it has been shown that collisions in dynamic branch predictors are more likely to be destructive than constructive. The preferred embodiment of the present invention advantageously reduces the likelihood of destructive collisions between branches.
There are several approaches, however, to reducing the destructive aliasing problem noted above. First, the number of entries in the predictor table can be increased possibly causing branches that would have collided to index to different entries in the table. Second, an indexing scheme can be chosen that best distributes the available counters among different combinations of branch address and history. Third, conditional branch instructions can be separated into different classes with each class using a different prediction scheme. As such, branches in two different classes cannot interfere with one another.
One approach that has been suggested with regard to the third approach is to use a static prediction technique for some conditional branches and a dynamic prediction technique for other branches. Static branch prediction uses the results of pre-run-time analysis of the software. Static prediction uses the knowledge of program structure or profiles from previous runs of a program to accurately predict the run-time outcome of branches. Certain types of conditional branch instructions fairly consistently have the same outcome (take the branch or do not take the branch). For example, conditional branches that check for error conditions generally result in the take or do not take outcome associated with there not being an error. By contrast, dynamic branch prediction is performed during run-time while the program is executing and is performed each time the branch instruction is fetched.
One variation on the idea of combining static and dynamic branch prediction schemes was suggested in a Ph.D. dissertation entitled “Static Methods in Branch Prediction” by Donald Lindsay, Department of Computer Science, University of Colorado, 1998. Lindsay proposed modifying conditional branch instructions to include information as to whether the processor should use its own dynamic prediction logic or use static prediction. If static prediction was dictated by the instruction, then the prediction itself was encoded into the branch instruction.
While theoretically adequate, Lindsay's approach may not be possible to implement in an existing processor architecture in which the conditional branch instructions have no extra bits in which to encode the dynamic or static prediction choice and, for static prediction, the prediction itself. Thus, an improvement to Lindsay's proposed combination of static and dynamic branch prediction is needed.