1. Field of the Invention
The present invention relates generally to the field of processors and in particular to a method of improving branch prediction by proactively managing the contents of a branch target address cache.
2. Background
Microprocessors perform computational tasks in a wide variety of applications. Improved processor performance is almost always desirable, to allow for faster operation and/or increased functionality through software changes. In many embedded applications, such as portable electronic devices, conserving power is also a goal in processor design and implementation.
Many modern processors employ a pipelined architecture, where sequential instructions, each having multiple execution steps, are overlapped in execution. For improved performance, the instructions should flow continuously through the pipeline. Any situation that causes instructions to stall in the pipeline can detrimentally influence performance. If instructions are flushed from the pipeline and subsequently re-fetched, both performance and power consumption suffer.
Most programs include conditional branch instructions, the actual branching behavior of which is not known until the instruction is evaluated deep in the pipeline. To avoid the stall that would result from waiting for actual evaluation of the branch instruction, modern processors may employ some form of branch prediction, whereby the branching behavior of conditional branch instructions is predicted early in the pipeline. Based on the predicted branch evaluation, the processor speculatively fetches (prefetches) and executes instructions from a predicted address—either the branch target address (if the branch is predicted to be taken) or the next sequential address after the branch instruction (if the branch is predicted not to be taken). Whether a conditional branch instruction is to be taken or not to be taken is referred to as determining the direction of the branch. Determining the direction of the branch may be made at prediction time and at actual branch resolution time. When the actual branch behavior is determined, if the branch was mispredicted, the speculatively fetched instructions must be flushed from the pipeline, and new instructions fetched from the correct next address. Prefeteching instructions in response to an erroneous branch prediction can adversely impact processor performance and power consumption. Consequently, improving the accuracy of branch prediction is an important processor design goal.
One known form of branch prediction includes partitioning branch prediction into two predictors: an initial branch target address cache (BTAC) and a branch history table (BHT). The BTAC, also known as a branch target buffer (BTB), is indexed by an instruction fetch address and contains the next fetched address, also referred to as the branch target, corresponding to the instruction fetch address. Entries are added to a conventional BTAC after a branch instruction has passed through the processor pipeline and its branch has been taken. If the conventional BTAC is full, entries are conventionally removed from the BTAC using standard cache replacement algorithms (such as round robin or least-recently used) when the next entry is being added.
BTACs, in general, are often embodied as a highly-associative cache design and accessed early in the fetch pipeline. If the fetch address matches a BTAC entry (a BTAC hit), the corresponding next fetch address or target address is fetched in the next cycle. This match and subsequent fetching of the target address is referred to as an implicit taken branch prediction. If there is no match (a BTAC miss), the next sequentially incremented address is fetched in the next cycle. This no match situation is also referred to an implicit not-taken prediction.
BTACs, in general, are utilized in conjunction with a more accurate individual branch direction predictor such as the branch history table (BHT) also known as a pattern history table (PHT). Conventional BHTs are accessed later in the pipeline than a conventional BTAC. As such, additional information may be potentially present in order to make a better prediction. A conventional BHT may contain a set of saturating predicted direction counters to produce a more accurate taken/not-taken decision for individual branch instructions. For example, each saturating predicted direction counter may comprise a 2-bit counter that assumes one of four states, each assigned a weighted prediction value, such as:
11—Strongly predicted taken
10—Weakly predicted taken
01—Weakly predicted not taken
00—Strongly predicted not taken
BHTs, in general, are conventionally indexed by bits stored in a branch history register (BHR). The output of a conventional BHT is a taken or not taken decision which results in either fetching the target address of the branch instruction or the next sequential address in the next cycle. The BHT is commonly updated with branch outcome information as it becomes known.
Utilizing a conventional BHT, a processor may override an earlier implicit prediction made by a BTAC. For example, a BTAC may hit (implicitly predicting a taken branch), but the BHT may override the BTAC implicit prediction with a not taken prediction. Conversely, following a BTAC miss, the BHT may override the BTAC miss with a taken prediction provided the target address is now known at this point in the processor pipeline.
Overriding BTAC predictions by a BHT results in wasted cycles resulting from flushing the processor pipeline. Overriding BTAC predictions by a BHT can happen repeatedly when a similar branch instruction is subsequently processed by the pipeline. For example, if the BTAC implicitly predicts taken by a match being found in the BTAC, instructions from the target address (taken branch) begin to be fetched in to the processor pipeline. If the BHT subsequently overrides the BTAC prediction by deciding that the branch should not be taken, all the instructions after the fetching of the target address have to be flushed from the pipeline. In this conventional branch prediction technique, this cycle potentially repeats itself for the same branch instruction subsequently fetched. This problem of repeating branch prediction conflicts on subsequent fetching of the same conditional branch instruction is referred to herein as the multiple flush cycle problem. In a conventional approach, the multiple flush cycle problem may continue to exist for a conditional branch instruction until the BTAC is updated. Therefore, it is recognized that apparatus and methods are needed to proactively manage the BTAC and reduce the probability of the occurrence of the multiple flush cycle problem.