1. Field of the Invention
The present invention generally relates to prediction of conditional branch instructions in a microprocessor. Still more particularly, the invention relates to branch prediction in a processor that fetches more than one block of instructions at a time. More particularly still, the invention relates to branch prediction with bank interleaved prediction arrays.
2. Background of the Invention
A microprocessor comprises the logic, typically a semiconductor device, which executes software. Microprocessors thus fetch software instructions from memory and executes them. Each instruction generally undergoes several stages of processing. For example, the instruction must be fetched and decoded to determine the type of instruction (add, multiply, memory write, etc.). Then, the instruction is scheduled, executed and finally retired. Each stage of processing may take multiple clock cycles. It has been recognized that the next instruction to be executed by a processor can be fetched and entered into the processor's pipeline before the previous instruction is retired. Thus, some processors are designed with pipelined architectures to permit multiple instructions to be at various stages of processing at any one point in time. For example, while one instruction is being scheduled, the next instruction can be fetched and decoded. Moreover, as the pipeline increases in length with developments in microprocessor design, the processor can have more instructions at various stages of processing.
A computer programmer has a variety of different types of instructions at his or her disposal when writing software. One type of instruction is generically referred to as a “conditional branch” instruction. This instruction includes a condition that is checked and the condition can either be true or false. For example, the condition might be to check whether a certain error condition exists. The error condition either exists or not. If the error condition currently exists, the condition is true, otherwise the condition is false (i.e., the condition does not exist). Consequently, one set of instructions is executed if the condition is true, and another set of instructions is executed if the condition is false.
Each instruction is stored at a unique address in memory. Typically, if a conditional branch instruction checks a condition that turns out to be false, then program execution follows to the next instruction following the conditional branch instruction. If the condition is true, however, program execution generally jumps (also called “branches”) to a different instruction and the processor continues executing from that instruction. Thus, the branch is either “taken” or “not taken” depending on whether the condition is true or not. If the condition is true, the branch is taken and the processor's instruction pointer (which contains the address of each instruction to be executed) is reloaded with a different address from the branch instruction to continue execution. If the condition is false, the branch is not taken and the instruction pointer is simply incremented so that processor continues execution with the instruction immediately following the conditional branch instruction.
In a pipelined architecture, instructions may be fetched to enter the pipeline before a previously fetched conditional branch instruction is actually executed. Accordingly, pipelined processors include branch prediction logic that predicts the outcome of branch instructions before the branch instructions are actually executed. The branch predictor logic thus predicts whether the branch is likely to be taken or not, and thus which instructions are to be fetched following the fetching of a conditional branch instruction. The branch predictor merely predicts the future outcome of the conditional branch instruction; the true outcome will not be accurately known until the branch instruction is actually executed. If the branch predictor turns out to have made the correct prediction, then instructions that must be executed are already in the pipeline. If the prediction turns out to have been inaccurate, then the incorrect instructions that had been fetched must be thrown out and the correct instructions fetched. Performance suffers on mispredictions. Choosing a branch prediction scheme that results in correct predictions much more often than mispredictions will result in higher performance.
Superscalar processors are increasingly executing more and more instructions in parallel. Therefore, more and more instructions must be fetched in parallel. Some processors fetch multiple blocks of instructions (i.e., a group of two or more instructions) at a time. The blocks of instructions may or may not be contiguous. Each block may contain one or more conditional branch instructions that must be predicted. Accordingly, there is a need to predict multiple branch instructions generally simultaneously (i.e., in the same clock cycle).
Most simple branch predictors include a table of counters. The prediction table typically includes multiple entries and each entry includes a prediction as to whether a conditional branch instruction will be taken or not. Once a conditional branch instruction is fetched, the address for that instruction is used to generate an index value which is then combined with history information regarding past branch outcomes. The resulting value is then used to point to one of the entries in the prediction table. As such, on encountering a conditional branch instruction in program flow, the table of counters is indexed for the given branch. The most significant bit of the counter at the indexed entry often is used as the prediction for the branch. A “1” may mean that the branch should be taken, while a “0” may mean that the branch should not be taken. The counter is updated (“trained”) once the outcome of the branch is accurately known.
Various branch prediction schemes differ in the way the prediction table is indexed by a conditional branch instruction. For example, it has been suggested that for processors that fetch multiple blocks of instructions at a time, the indexing scheme should use a single function that considers the position of the branch instruction in the block of fetched instructions. That is, the position in the predictor table of the prediction associated with a particular branch instruction depends on the position of the branch in the block of instructions containing the branch. This type of indexing scheme leads to interference in the predictor tables. This means that two or more different branches may index the same entry in the table leading to some undesirable results whereby the various branches are predicted erroneously.
Another suggested indexing technique is to use information associated with a fetched block of instructions to predict the branches in the next fetched block. In this technique, the accessed entry in the predictor table does not depend on the position of the fetch block in the group of blocks fetched in parallel. However, the predictor tables must have multiple read ports to manage conflicts when the same entry in the table is being accessed as a result of predicting two different branch instructions. That is, the memory used to store the prediction table must be dual-ported. Multiple read ports undesirably add considerable complexity and significantly reduce useful storage capacity of the prediction table.
Accordingly, an improved branch prediction indexing scheme is needed for processors that can simultaneously fetch multiple blocks of instructions.