I. Field of the Invention
The present invention relates to the field of computer systems. More specifically, the present invention relates to microprocessors, in particular to the prediction of branch instructions.
II. Background Information
Microprocessors (or xe2x80x9cprocessorsxe2x80x9d) execute a series of program instructions, each instruction having an address. Typically instructions are executed in sequence, with branch instructions causing out of sequence execution by causing the processor to branch to an instruction. Pipelined processors generally process instructions in a sequence of stages, such as fetch, decode, execute, and retire, forming a pipeline. Different aspects of different instructions are processed at the same time by different stages forming the pipeline. While one instruction is being fetched from memory, another is being decoded, another is being executed, etc.
When it is known whether or not an instruction being processed in the pipeline will cause a branch, and to what address the instruction will cause a branch (the xe2x80x9cbranch targetxe2x80x9d), the branch is resolved. Branch instructions typically are not resolved until after the execution stage. When a branch is resolved, if the fetch unit has not fetched the proper branch target, the instructions fetched and placed in the pipeline subsequent to that branch instruction must be flushed, i.e. removed, from the pipeline. Thus, a certain amount of processing effort, taking a certain amount of time, is wasted. In order for a pipelined processor to operate efficiently, the instruction fetch unit at the head of the pipeline must continually provide the pipeline with instructions to process. If it can be determined with reasonable accuracy, soon after an instruction is fetched, whether or not the instruction will cause a branch, and to what address the instruction will cause a branch, such inefficiencies can be avoided. If, when a branch instruction is fetched, instead of fetching the instruction subsequent to a branch instruction or a predicted target address, no fetch occurs, the processor stalls and a xe2x80x9cbubblexe2x80x9d is created. The delay in fetching the next instruction will cause each stage of the pipeline to be idle for a period of time.
Mechanisms exist in processors for using the address of an instruction to predict if an instruction is likely to be a branch, and if so, the likely outcome, early in the pipeline sequence. These mechanisms take a portion of the instruction address, possibly in combination with a representation of the history of the recent state of the processor, and use this to access a table. A table may be implemented in any number of manners; for example in a cache, buffer or memory, or by other methods. Entries in the table provide information such as whether or not the instruction is likely to be a branch, the likely target address for the branch, and whether or not the branch will be taken. If the instruction is predicted to be a taken branch the likely target address can be provided to the fetch unit, which fetches the instruction and, if the prediction is correct, prevents a stall. If the prediction is incorrect a stall will occur; thus branch prediction mechanisms are only worthwhile if they predict target addresses with some amount of accuracy. Branch prediction mechanisms are costly in terms of processor resources. The more resources devoted to a branch prediction mechanism, the more accurate the mechanism can be.
Branches may be classified based on two independent characterizations. A branch instruction may be conditional or unconditional, and may be direct or indirect. An unconditional branch instruction always causes a branch. A conditional branch instruction-either branches to a target address or continues to the instruction following the branch instruction (xe2x80x9cfalls throughxe2x80x9d) depending on a condition (e.g., the non-zero status of an operand). A direct branch always branches to the same target (if the branch is taken), whereas the target of an indirect branch is determined after some calculation and is thus not known until the branch instruction is executed.
Branch prediction mechanisms may be caches containing as entries predicted branch targets. Such mechanisms may be formed from set associative caches, which store information in a plurality of lines, each line having a plurality of entries called ways. Each way is indexed by an associated tag. An n-way set associative cache has n ways per line. An index and tag are used to access an entry. The index accesses a line in the cache. The tag is then matched to one of the n tags in the line. If a tag matches a xe2x80x9chitxe2x80x9dresults and the entry corresponding to the tag is returned; otherwise a xe2x80x9cmissxe2x80x9d occurs and no result is returned.
A branch target buffer (xe2x80x9cBTBxe2x80x9d) is a cache containing as its entries branch prediction information. A BTB may contain combine branch information on whether or not branches are predicted to be taken with information on predicted targets; other systems may use separate buffers for such sets of information. BTBs can be implemented in various ways. In one known implementation a BTB is a set associative cache. Each way stores a predicted target address, a taken/not taken prediction, and information on the predicted type of branch (e.g., direct or indirect). The BTB is indented by a portion of the address of the instruction for which a branch prediction is desired (the xe2x80x9cbranch addressxe2x80x9d), and the tag is formed another portion of the branch address. When used herein, xe2x80x9cbranch addressxe2x80x9d may refer to the addresses of actual branch instructions as well as to those of instructions where it is not known whether or not the instruction is a branch, but for which a prediction is desired.
Other known implementations may index a BTB by the result of an exclusive- or (xe2x80x9cXORxe2x80x9d) operation on a portion of the instruction address and a path history register. The XOR operation (represented herein by xe2x80x9c⊕xe2x80x9d) produces a 0 if both of its inputs are either 1 or 0 and produces a 1 if one input is 1 and the other is 0. History registers and path history registers are registers containing, in some form, the history of the last several branches. A history register records information on whether or not branches were taken. For example, entries in a history register may be 1 for a branch that is taken and 0 for a branch that is not taken. A path history register records information on the addresses of branch instructions or targets and information on whether or not branches were taken. A history register may be global (recording history for all branches) or may be particular to each of a number of branch addresses for which a prediction is desired.
In one existing branch prediction method, a target cache contains target addresses of indirect branch targets. The cache is indexed by a calculation involving a branch address and a register. The register is formed by shifting into the register a small number of bits from the target address of previous branches. In such a scheme a BTB may be accessed in parallel with the target cache to determine the type of branchxe2x80x94i.e., indirect, conditional, or other types.
One known prediction method uses a path information register (xe2x80x9cPIRxe2x80x9d) recording information on conditional direct branches only. The PIR is formed by XORing the PIR itself with a number of bits from the address of the target of the current conditional direct branch instruction, shifting the result left one bit, and adding as the rightmost bit an indication of the branch outcome (taken or not taken). The PIR is used to index a BTB storing binary predictions (taken or not taken) for conditional direct branches.
A tradeoff occurs between devoting resources to branch prediction mechanisms and their accuracy. It is costly to implement branch prediction mechanisms. However, the less resources devoted to such mechanisms the less accurate they are. Branch prediction accuracy suffers when less information is stored as branch history, less information is used to index prediction tables and less information is stored in prediction tables. It is desirable to have an accurate yet efficient method to provide branch prediction for indirect branches. Furthermore, branch prediction methods which are accurate and efficient for and appropriate for conditional branches may not be accurate and efficient for and appropriate for indirect branches. It is desirable to have a branch prediction mechanism that is efficient and accurate for a indirect branches as well as direct branches.
A system and method for predicting a branch target for a current instruction in a microprocessor, the system comprising a cache storing indirect branch instructions and a path register. The path register is updated on certain branches by an XOR operation on the path register and the branch instruction, followed by the addition of one or more bits to the register. The cache is indexed by performing an operation on a portion of the current instruction address and the path register; the entry returned, if any, may be used to predict the target of the current instruction.