A. Field of the Invention
The invention generally relates to computer architecture, and, more particularly, to branch prediction.
B. Description of the Related Art
Modem high performance computer processors typically employ pipelining to increase performance. xe2x80x9cPipeliningxe2x80x9d refers to a processing technique in which multiple sequential instructions are executed in an overlapping manner. A general description of pipelining can be found in xe2x80x9cComputer Organization and Designxe2x80x9d by David A. Patterson and John L. Hennessy (b 2d ed. 1988, pp. 436-516).
FIG. 1 shows the timing of instruction processing in a conventional five-stage pipeline processor architecture. With such an architecture, the processor can simultaneously process different stages of up to five successive instructions. The five stages shown in FIG. 1 are: IF (instruction fetch), ID (instruction decode), EX (execute instruction), MEM (memory access), and WB (write back to register).
For example, at clock cycle 1, the processor fetches instruction I1. At clock cycle 2, the processor decodes instruction I1 and fetches instruction I2. In the same manner, the processor continues to process instructions as they are received; by clock cycle 5, the processor writes back the result of instruction I1, accesses memory for instruction I2, executes instruction I3, decodes instruction I4, and fetches instruction I5. In contrast, a non-pipelined architecture would complete processing of an entire instruction (e.g., instruction I1) before beginning to process the next instruction (e.g., instruction I2).
When program flow is perfectly sequential, a pipelined architecture can achieve significant performance advantages over non-pipelined architecture. In actual programs, however, approximately twenty percent of program instructions are branches. Branch instructions cause a program to deviate from a sequential flow. Consequently, the instruction to be executed (the target of the branch) may not be the next instruction in the fetch sequence.
A processor may recognize that an instruction is a branch instruction in the IF stage (the first stage of the five-stage pipeline). For conditional branch instructions, however, the processor typically cannot determine whether the branch should be taken until it reaches the EX stage (the third stage of the five-stage pipeline). By this time, the processor has already fetched and begun processing the next two instructions. The processing of those two instructions is wasted and inefficient if the branch instruction redirects program flow to another location.
Referring to FIG. 1, if instruction I1 is a conditional branch instruction that redirects flow to instruction I6, the processor does not recognize this until clock cycle 3 (EX), when the processor is executing instruction I1. By this time, the processor has already fetched instruction I2 during clock cycle 2, and decoded instruction I2 and fetched instruction I3 during clock cycle 3. This processing of instructions I2 and I3 is wasted, however, because branch instruction I1 causes flow to skip to instruction I6, with no further processing of instructions I2 or I3. Moreover, the branching causes a stall in the pipeline while the correct instruction (I6) is fetched. These inefficiencies caused by branches become exacerbated when deeper pipelines or superscalar processors are used because it takes longer to resolve a branch.
One approach to solving this problem, called branch prediction, involves making accurate, educated determinations about whether an instruction will result in a branch to another location. Branch prediction is premised on the assumption that, under similar circumstances, the outcome of a conditional branch will likely be the same as prior outcomes. Because branch prediction can be implemented in the IF stage of processing, there is no wasted instruction processing if the result of the conditional branch is always predicted correctly.
Conventional branch prediction techniques include correlation-based schemes and global branch history with index sharing (xe2x80x9cgsharexe2x80x9d). Although these techniques are somewhat effective, the frequency of erroneous prediction using these techniques may be unacceptable. There remains, therefore, a need for a branch prediction scheme that reduces the frequency of erroneous prediction.
In accordance with the invention, as embodied and broadly described herein, a method of predicting whether a branch will be taken involves reading bits from a local history table and concatenating them with bits from a global history register. The result of the concatenation is combined with bits from the instruction address by performing an exclusive or operation. The result of the exclusive or operation is used to read a branch prediction table.
In accordance with the invention, an apparatus for predicting whether a branch will be taken comprises a local history table and a global history register. The local history table and the global history table are connected to inputs of a concatenating circuit. The output of the concatenating circuit is connected to one input of an exclusive or circuit, with an instruction address source being connected to another input. The output of the exclusive or circuit is connected to an input of a branch prediction table.
It is to be understood that both the foregoing general description and following detailed description are intended only to exemplify and explain the invention as claimed.