1. Field of Invention
The present invention relates generally to computer systems, and more particularly to a method and a system for using a working global history register.
2. Relevant Background
At the heart of the computer platform evolution is the processor. Early processors were limited by the technology available at that time. New advances in fabrication technology allow transistor designs to be reduced up to and exceeding 1/1000th of the size of early processors. These smaller processor designs are faster, more efficient and use substantially less power while delivering processing power exceeding prior expectations.
As the physical design of the processor evolved, innovative ways of processing information and performing functions have also changed. For example, “pipelining” of instructions has been implemented in processor designs since the early 1960's. One example of pipelining is the concept of breaking execution pipelines into units or stages, through which instructions flow sequentially in a stream. The stages are arranged so that several stages can be simultaneously processing the appropriate parts of several instructions. One advantage of pipelining is that the execution of the instructions is overlapped because the instructions are evaluated in parallel.
A processor pipeline is composed of many stages where each stage performs a function associated with executing an instruction. Each stage is referred to as a pipe stage or pipe segment. The stages are connected together to form the pipeline. Instructions enter at one end of the pipeline and exit at the other end.
Most programs executed by the processor include conditional branch instructions, the actual branching behavior of which is not known until the instruction is evaluated deep in the pipeline. To avoid a stall that would result from waiting for actual evaluation of the branch instruction, modern processors may employ some form of branch prediction, whereby the branching behavior of a conditional branch instruction is predicted early in the pipeline. Based on the predicted branch evaluation, the processor speculatively fetches and executes instructions from a predicted address—either the branch target address (if the branch is predicted to be taken) or the next sequential address after the branch instruction (if the branch is predicted not to be taken). Whether a conditional branch instruction is taken or not taken is referred to as determining the direction of the branch. Determining the direction of the branch may be made at prediction time and at actual branch resolution time. When the actual branch behavior is determined, if the branch was mispredicted, the speculatively fetched instructions must be flushed from the pipeline, and new instructions fetched from the correct address. Speculatively fetching instructions in response to an erroneous branch prediction can adversely impact processor performance and power consumption. Consequently, improving the accuracy of branch predictions is an important processor design goal.
One known form of branch prediction includes partitioning branch prediction into two predictors: an initial branch target address cache (BTAC) and a branch history table (BHT). The BTAC is indexed by an instruction fetch group address and contains the next fetched address, also referred to as the branch target, corresponding to the instruction fetch group address. Entries are added to the BTAC after a branch instruction has passed through the processor pipeline and its branch has been taken. If the BTAC becomes full, entries are removed from the BTAC using standard cache replacement algorithms (such as round robin or least-recently used) when the next entry is being added.
The BTAC may be a highly-associative cache design and is accessed early in the instruction execution pipeline. If the fetch group address matches a BTAC entry (a BTAC hit), the corresponding next fetch address or target address is fetched in the next cycle. This match and subsequent fetching of the target address is referred to as an implicit taken branch prediction. If there is no match (a BTAC miss), the next sequentially incremented address is fetched in the next cycle. This no match situation is also referred to an implicit not-taken prediction.
BTACs may be utilized in conjunction with a more accurate individual branch direction predictor such as a branch history table (BHT) also known as a pattern history table (PHT). A conventional BHT may contain a set of saturating predicted direction counters to produce a more accurate taken/not-taken decision for individual branch instructions. For example, each saturating predicted direction counter may comprise a 2-bit counter that assumes one of four states, each assigned a weighted prediction value, such as:
11—Strongly predicted taken
10—Weakly predicted taken
01—Weakly predicted not taken
00—Strongly predicted not taken
The output of a conventional BHT, also referred to as a prediction value, is a taken or not taken decision which results in either fetching the target address of the branch instruction or the next sequential address in the next cycle. The BHT is commonly updated with branch outcome information as it becomes known.
In order to increase the accuracy of branch predictions, various other prediction techniques may be implemented which use recent branch history information from other branches as feedback. As those skilled in the art appreciate, current branch behavior may be correlated to the history of previously executed branch instructions. For example, the history of previously executed branch instructions may influence how a conditional branch instruction is predicted.
A Global History Register (GHR), also referred to in the art as a global branch history register or a global history shift register, may be used to keep track of the past history of previously executed branch instructions. As stored by the GHR, the branch history provides a view of the sequence of branch instructions encountered in the code path leading up to the presently executed branch instruction in order to achieve improved prediction results.
In some processors, identification of a branch instruction and its associated prediction information may occur only after an instruction decode stage. Commonly, the instruction decode stage may be a later stage in the instruction execution sequence. After an instruction is decoded and confirmed as a branch instruction, the GHR is loaded with appropriate branch history information. As the branch history information is identified it is shifted into the GHR. The output of the GHR is used to identify the prediction value stored in the BHT which is used to predict the next conditional branch instruction.
In a conventional processor using a GHR, the GHR may not reflect the actual branch history information encountered when multiple branch instructions are executed in parallel during a relatively short period of time. In this instance, the GHR may not be updated with the branch history information from the first branch instruction before the second branch instruction is predicted. As a result, an inaccurate value of the GHR may be used to identify the entry in the BHT for the second conditional branch instruction. Using an inaccurate value to index the entry in the BHT may affect the accuracy of the branch prediction. If the processor had been able to keep pace with the branch history information from the first conditional branch instruction, a different value would have been stored in the GHR and a different entry in the BHT would have been identified for the second conditional branch instruction.