1. Field of the Invention
This invention is related to the field of processors and, more particularly, to branch prediction mechanisms in processors.
2. Description of the Related Art
Superscalar processors may achieve high performance by executing multiple instructions per clock cycle and by choosing the shortest possible clock cycle consistent with the design. As used herein, the term xe2x80x9cclock cyclexe2x80x9d refers to an interval of time accorded to various stages of an instruction processing pipeline within the processor. On the other hand, superpipelined processors may achieve high performance by implementing numerous stages in the instruction processing pipeline and overlapping execution of a large number of instructions using the numerous stages.
An important feature of a superscalar or superpipelined processor is its branch prediction mechanism. The branch prediction mechanism indicates a predicted direction (taken or not-taken) for a branch instruction and/or a predicted target address, allowing subsequent instruction fetching to continue within the predicted instruction stream indicated by the branch prediction. Instructions from the predicted instruction stream may be speculatively executed prior to execution of the branch instruction, and/or may be placed into the instruction processing pipeline prior to execution of the branch instruction. If the predicted instruction stream is correct, then the average number of instructions executed per clock cycle is advantageously increased. However, if the predicted instruction stream is incorrect (i.e. one or more branch instructions are predicted incorrectly), then the instructions from the incorrectly predicted instruction stream are discarded from the instruction processing pipeline and the number of instructions executed per clock cycle is decreased.
A branch instruction is an instruction which causes subsequent instructions to be fetched from one of at least two addresses: a sequential address identifying an instruction stream beginning with instructions which directly follow the branch instruction; and a target address identifying an instruction stream beginning at an arbitrary location in memory. The target address may be generated from: (i) either the instruction address of the branch instruction or the instruction sequential to the branch instruction (where the instruction address is the memory address at which the instruction is stored, and is often referred to as the program counter address, or PC); and/or (ii) one or more operands of the instruction. Unconditional branch instructions always branch to the target address, while conditional branch instructions may select either the sequential or the target address based on the outcome of a prior instruction. Branch instructions may also be categorized as direct or indirect. Direct branch instructions generate a target address from at most a displacement encoded into the instruction and the instruction address, and thus do not require an operand fetch to generate the target address. Indirect branch instructions require at least one operand fetch (e.g. from a register or a memory location) to generate the target address.
Accurately predicting indirect branch instructions has become increasingly important. Indirect branch instructions are typically more prevalent in object-oriented programming styles (e.g. Java, C++, etc.). For example, class member functions are typically called using indirect branch instructions.
When predicting indirect branch instructions, the target address is predicted since the target address cannot be calculated without fetching the operands of the branch instruction. Since the operands are in registers or memory locations, the operands may be changed between various executions of a particular indirect branch instruction and thus the target address of the particular indirect branch instruction may change from execution to execution. The target address resulting from an execution of the particular indirect branch instruction may be correlated to the previously encountered branch instructions (in other words, the target address may be correlated with the instructions executed prior to execution of the particular indirect branch). An indirect branch predictor, designed with cost of implementation and accuracy of prediction as design goals and taking into account the correlation that may exist between the target address of the particular indirect branch instruction and previously encountered branch instructions, is desired.
The problems outlined above are in large part solved by an indirect branch target predictor as described herein. The indirect branch predictor includes a buffer storing branch target addresses corresponding to previously executed indirect branch instructions. The buffer is indexed with an index derived from history information corresponding to previously predicted indirect branch instructions (e.g. a portion of the predicted target address corresponding to previously predicted indirect branch instructions may be used) and from the PC of the particular indirect branch instruction being predicted. The target address from the indexed entry may be used as the prediction for the particular indirect branch instruction. In one embodiment, the buffer may be tagless, thereby reducing cost by eliminating storage for the tags. Additionally, the buffer may be direct mapped in one embodiment, which may reduce power consumption during access to the buffer. In various embodiments, the indirect branch target predictor may generate the index to the buffer using one or more techniques to improve the accuracy of the prediction.
A first index generation technique involves offsetting the history information from the various previously predicted indirect branch instructions. In other words, bits in the same bit position within the history information corresponding to each previously predicted indirect branch instruction affect different bits of the generated index. By offsetting the history information, the indirect branch target predictor may more accurately reflect the order in which the previously predicted indirect branch instructions occur. Prediction accuracy may be increased for cases in which the order of the previously predicted branch instructions affects the target address generated by a particular indirect branch instruction.
A second index generation technique involves weighting the history information based on the age of the previously predicted indirect branch instructions. The number of bits of history information corresponding to more recently predicted indirect branch instructions used in generating the index may be greater than the number of bits of history information corresponding to less recently predicted indirect branch instructions. Prediction accuracy may be improved for those cases in which the correct prediction is more closely correlated to the more recently predicted indirect branch instructions than to the less recently predicted indirect branch instructions.
A third index generation technique involves reversing the bit order of the PC of the particular indirect branch instruction being predicted. In other words, the most significant bits of the portion of the PC used in generating the index may be used in generating least significant bits of the index, while most significant bits of the history information may be used in generating the most significant bits of the index. For code which exhibits locality, the most significant bits of the PC may be relatively stable at any given point in time. The least significant bits of the PC change for each byte and thus may be a quasi-tag for the indirect branch instruction. The most significant bits of the PC are combined with the most recent bits of the history information, and may thus preserve the most recent history information since the most significant bits of the PC are not changing very frequently. The least significant bits of the PC are combined with the least recent bits of the history information. Accordingly, entries of the buffer indexed by one indirect branch instruction may be more likely to be different than the entries of the buffer indexed by another indirect branch instruction (which may be likely to have different least significant bits). Prediction accuracy may be increased due to the more likely allocation of different entries to different indirect branch instructions.
While the above indexing techniques are described for indirect branch prediction, these techniques may be used for indexing any sort of prediction buffer.
In one embodiment, the indirect branch predictor may include a second buffer indexed by a portion of the PC of the particular indirect branch instruction being predicted. The second buffer may store target addresses corresponding to previously executed indirect branch instructions. Additionally, the second buffer may further store prediction selection information which may be used to select a target address prediction from one of the buffer and the second buffer. In the event of a misprediction, both buffers may be updated and the prediction selection information may be changed to select the opposite buffer to the buffer currently selected by the prediction selection information. Since no history information is used to index the second buffer, a single entry may be allocable to each indirect branch instruction. The second buffer may accurately predict indirect branch instructions whose target addresses are relatively fixed, using the buffer indexed by history information to predict the indirect branch instructions having more frequently changing indirect targets. Since the second buffer may filter the relatively fixed target indirect branch instructions, additional history information may be used to index the buffer (possibly causing a particular indirect branch instruction to occupy more entries in the buffer). Prediction accuracy may thereby be increased. In one embodiment, the second buffer may be tagless and/or direct mapped.