1. Field of the Invention
This invention is related to the field of superscalar microprocessors and, more particularly, to the classification of conditional branches in branch prediction.
2. Description of the Related Art
Superscalar microprocessors achieve high performance by executing multiple instructions per clock cycle and by choosing the shortest possible clock cycle consistent with the design. As used herein, the termxe2x80x9cclock cyclexe2x80x9d refers to an interval of time accorded to various stages of an instruction processing pipeline within the microprocessor. Storage devices (e.g. registers and arrays) capture their values according to the clock cycle. For example, a storage device may capture a value according to a rising or falling edge of a clock signal defining the clock cycle. The storage device then stores the value until the subsequent rising or falling edge of the clock signal, respectively. The term xe2x80x9cinstruction processing pipelinexe2x80x9d is used herein to refer to the logic circuits employed to process instructions in a pipelined fashion. Although the pipeline may be divided into any number of stages at which portions of instruction processing are performed, instruction processing generally comprises fetching the instruction, decoding the instruction, executing the instruction, and storing the execution results in the destination identified by the instruction.
An important feature of a superscalar microprocessor (and a superpipelined microprocessor as well) is its branch prediction mechanism. The branch prediction mechanism indicates a predicted direction (taken or not taken) for a branch instruction, allowing subsequent instruction fetching to continue within the predicted instruction stream indicated by the branch prediction. A branch instruction is an instruction which causes subsequent instructions to be fetched from one of at least two addresses: a sequential address identifying an instruction stream beginning with instructions which directly follow the branch instruction; and a target address identifying an instruction stream beginning at an arbitrary location in memory. Unconditional branch instructions always branch to the target address, while conditional branch instructions may select either the sequential or the target address based on the outcome of a prior instruction. Instructions from the predicted instruction stream may be speculatively executed prior to execution of the branch instruction, and in any case are placed into the instruction processing pipeline prior to execution of the branch instruction. If the predicted instruction stream is correct, then the number of instructions executed per clock cycle is advantageously increased. However, if the predicted instruction stream is incorrect (i.e. one or more branch instructions are predicted incorrectly), then the instructions from the incorrectly predicted instruction stream are discarded from the instruction processing pipeline and the number of instructions executed per clock cycle is decreased.
In order to be effective, the branch prediction mechanism must be highly accurate such that the predicted instruction stream is correct as often as possible. Typically, increasing the accuracy of the branch prediction mechanism is achieved by increasing the complexity of the branch prediction mechanism. Among the methods used to predict branches are local branch prediction and global branch prediction. Local branch prediction involves making a prediction based on the behavior of a particular branch the past few times it was executed. Local branch prediction is effective for branches exhibiting repetitive patterns. On the other hand, global branch prediction involves making a branch prediction based on the history of the last few branches to have been executed. Global branch prediction is useful when the behavior of a branch is related to the behavior of the prior executed branches.
One problem with global branch prediction schemes is they do not account for branches that do not require a global history for prediction. Typically, all conditional branches participate in global history counter training. While some branches may be conditional, they may in fact exhibit static behavior by always being either taken or not taken. Such branches do not need a global history for prediction and contend with other conditional branches for history counter training. Consequently, the global prediction is in effect polluted by the training of branches which behave in a static manner.
The problems outlined above are in large part solved by a microprocessor and method as described herein. When a conditional branch is initially detected, it is classified as local and predicted not taken. If the branch is then actually taken, its prediction is changed to taken. If the branch is then actually not taken, its classification is changed to global, uses global branch prediction and participates in global history counter training. Advantageously, branches which exhibit static behavior may not participate in global history counter training. Instead, branches which are not taken may remain classified as local and not taken. Branches which are taken may remain classified as local and taken.
Broadly speaking, a branch prediction mechanism is contemplated comprising a local branch prediction storage, a global branch prediction storage, a branch target storage and a selection device. The local branch prediction storage receives a fetch address corresponding to a contiguous group of instructions and conveys a local branch prediction. The global branch prediction storage receives a fetch address and a global history which form an index for selecting a global prediction from the global prediction storage. The branch target storage also receives a fetch address is configured to store branch target addresses and classification indicators for classifying branches. This classification indicator is initially set to indicate the branch is local, but may be updated. Finally, a selection device is included for selecting either the local branch prediction or the global branch prediction in response to the classification conveyed by the classification indicator.
Also contemplated is a method comprising detecting that an instruction is a conditional branch. Upon such detection, a local branch prediction corresponding to the conditional branch is initialized to indicate the branch is predicted not taken. If the branch is then mispredicted, the local branch prediction is updated to indicate the branch is now predicted taken. In addition, the branch is classified as local. Finally, if the branch is again mispredicted, the branch classification is updated to indicate the branch is classified as global.