Many microprocessors employ a technique known as hardware pipelining to increase instruction throughput by processing several instructions through different phases of execution concurrently. To maximize instruction execution efficiency, it is desirable to keep the instruction execution pipeline full (with an instruction being processed in each pipeline stage) as often as possible such that the pipeline produces useful output every clock cycle. However, whenever there has been a transfer of program flow control to another section of software code and instructions have been speculatively fetched and processed and it is determined that these instructions should not have been executed, the output from the pipeline is not useful.
Exceptions and program flow control instructions such as branch instructions provide examples of how the program flow control can be changed. Branch instructions, which may be conditional or unconditional and may transfer program flow control to a preceding or subsequent code section, are used for frequently encountered situations where a change in program flow control is desired.
A conditional branch instruction determines instruction flow based on the resolution of a specified condition. If A>B then branch to instruction X is an example of a conditional branch instruction. In this case, if A>B, program flow control branches to a code section beginning with instruction X, also referred to as the target code section. If A is not greater than B, the instructions sequentially following the branch instruction in the program flow, referred to as the sequential code section, are executed. In executing such conditional branch instruction, it is required to check a condition of the branch instruction for determining the next instruction. Thus, performance of a microprocessor including a central processing unit (CPU) may be adversely affected in pipeline procedures of the microprocessor requiring fast instruction fetch.
To solve the aforementioned problem, many microprocessors adopt a branch predictor (or a branch prediction logic), which operates to predict the outcome of a branch instruction before identifying a condition check of the branch instruction, based on a predetermined branch prediction approach. Thus, instructions are then speculatively fetched from either the target code section or the sequential code section based on the prediction indicated by the branch predictor. Therefore, a pipeline stall can be prevented. However, when a branch prediction is missed, many instructions from the incorrect code section may be in various stages of processing in the instruction execution pipeline. On encountering such a misprediction, instructions following the mispredicted conditional branch instruction in the pipeline (or multiple pipelines) are flushed, and instructions from the other correct code section are fetched. Flushing the pipeline creates bubbles or gaps in the pipeline. Several clock cycles may be required before the next useful instruction completes execution, and before the instruction execution pipeline produces useful output. Such an incorrect guess causes the pipeline to stall until it is refilled with valid instructions. This delay is called the mispredicted branch penalty.
To reduce above described misprediction ratio, various kinds of branch predictors are used. Among the branch predictors, a two-level branch predictor is likely to become more common. A P6 processor of Intel Corporation is the first to use a two-level branch algorithm to improve accuracy. This algorithm, first published by Tse-Yu Yeh and Yale Patt, has the potential to push accuracy well beyond the 90% level achieved by the best processors today.
FIG. 1 is a schematic diagram for illustrating a structure of a conventional two-level branch predictor. For example, the branch predictor is illustrated in FIG. 2 of New Algorithm Improves Branch Prediction by Linley Gwennap, Mar. 27, 1995, MICROPROCESSOR REPORT, pp. 17-21.
Referring to FIG. 1, the two-level branch predictor is composed of a branch history register (BHR) 10 and a pattern history table (PHT) 20. The branch history register 10 is used for recording the actions of the most recent k conditional branches. For example, a 1 stored in the branch history register 10 may denote a branch taken, and a 0 stored in the branch history register 10 may denote a branch not taken, respectively. The performed k conditional branches are called a pattern.
The pattern history table 20 is used for recording pattern history bits Sc, which are used for predicting a conditional branch of a branch instruction to be performed in response to each pattern. For example, the two-level branch predictor predicts a conditional branch I(Sc) in response to an entry of 10 stored in the pattern history table 20. The entry corresponds with a pattern 111010 stored in the branch history register 10. According to the predicted conditional branch I(Sc), the next instruction to the branch instruction is fetched. Referring to the Gwennap paper referenced above, a predicted conditional branch I(Sc) is determined by a most significant bit (MSB) of pattern history bits Sc stored in the pattern history table 20.
For example, on the assumption that a real conditional branch of the branch instruction is Rc, if a predicted conditional branch I(Sc) is different from the real conditional branch Rc, this case is called a prediction miss. In this case, execution of instructions following the mispredicted conditional branch I(Sc) are withdrawn.
According to the real conditional branch Rc, both data of the branch history register 10 and the pattern history bits Sc stored in the pattern history table 20 are changed. This process is described as follows. When a least significant bit (LSB) corresponding to the real conditional branch Rc of the branch instruction is stored to the branch history register 10, the remaining bits are shifted to the left. At this time, the pattern history bits Sc stored in the pattern history table 20 is updated in response to the real conditional branch Rc. For example, if the real conditional branch Rc is 1 denoting predict taken, the pattern history bits Sc are increased by 1, and if the real conditional branch Rc is 0 denoting predict not taken, the pattern history bits Sc are decreased by 1. The pattern history bits Sc can be composed of an up/down saturating counter as shown in A Study of Branch Prediction Strategies, by J. Smith, May 1981, pp. 135-148. The saturating counter maintains a minimal value of pattern history bits Sc when the pattern history bits Sc are the minimal value, although the real conditional branch Rc is 0 denoting not taken. In addition, the saturating counter maintains a maximum value of pattern history bits Sc when the pattern history bits Sc are the maximum value, although the real conditional branch Rc is 1 denoting taken.
Although branch prediction accuracy may be improved or turned by using different branch prediction algorithms, mispredictions still occur. By the time a misprediction is identified, many instructions from the incorrect code section may be in various stages of processing in the instruction execution pipeline.
An example of a solution to the forgoing performance penalty relevant to mispredicting is disclosed in U.S. Pat. No. 5,860,017 to Sharangpani et al., issued on Jan. 12, 1999, entitled, “Processor and Method for Speculatively Executing Instructions from Multiple Instruction Streams Indicated by a Branch Instruction,” which identifies branch instructions, which in relationship to other conditional branch instructions, have a relatively high likelihood of being mispredicted. In this case, once a condition in a branch instruction is identified as being unlikely to be predicted accurately, the processor fetches and decodes instructions from both target and sequential instruction streams indicated by the conditional branch instruction. However, the method proposed by Sharangpani et al. may cause performance deterioration by a resource conflict and may lead to high hardware cost, since the processor fetches both target and sequential instruction streams. Therefore, there is a need for a branch predictor capable of efficient processing of branch instructions by reducing prediction miss with a comparatively simple circuit configuration and low hardware cost.