1. Field of the Invention
This invention relates in general to the field of pipelined microprocessor architecture, and more particularly to the prediction of conditional branch instruction outcomes.
2. Description of the Related Art
Computer instructions are typically stored in successive addressable locations within a memory. When processed by a Central Processing Unit (CPU), the instructions are fetched from consecutive memory locations and executed. Each time an instruction is fetched from memory, a program counter, or instruction pointer, within the CPU is incremented so that it contains the address of the next instruction in the sequence. This is the next sequential instruction pointer, or NSIP. Fetching of an instruction, incrementing of the program counter, and execution of the instruction continues linearly through memory until a program control instruction is encountered.
A program control instruction, when executed, changes the address in the program counter and causes the flow of control to be altered. In other words, program control instructions specify conditions for altering the contents of the program counter. The change in the value of the program counter as a result of the execution of a program control instruction causes a break in the sequence of instruction execution. This is an important feature in digital computers, as it provides control over the flow of program execution and a capability for branching to different portions of a program. Examples of program control instructions include Jump, Test and Jump conditionally, Call, and Return.
A Jump instruction causes the CPU to unconditionally change the contents of the program counter to a specific value, i.e., to the target address for the instruction where the program is to continue execution. A Test and Jump conditionally causes the CPU to test the contents of a status register, or possibly compare two values, and either continue sequential execution or jump to a new address, called the target address, based on the outcome of the test or comparison. A Call instruction causes the CPU to unconditionally jump to a new target address, but also saves the value of the program counter to allow the CPU to return to the program location it is leaving. A Return instruction causes the CPU to retrieve the value of the program counter that was saved by the last Call instruction, and return program flow back to the retrieved instruction address.
In early microprocessors, execution of program control instructions did not impose significant processing delays because such microprocessors were designed to execute only one instruction at a time. If the instruction being executed was a program control instruction, by the end of execution the microprocessor would know whether it should branch, and if it was supposed to branch, it would know the target address of the branch. Thus, whether the next instruction was sequential, or the result of a branch, it would be fetched and executed.
Modern microprocessors are not so simple. Rather, it is common for modern microprocessors to operate on several instructions at the same time, within different blocks or pipeline stages of the microprocessor. Hennessy and Patterson define pipelining as, xe2x80x9can implementation technique whereby multiple instructions are overlapped in execution.xe2x80x9d Computer Architecture: A Quantitative Approach, 2nd edition, by John L. Hennessy and David A. Patterson, Morgan Kaufmann Publishers, San Francisco, Calif., 1996. The authors go on to provide the following excellent illustration of pipelining:
A pipeline is like an assembly line. In an automobile assembly line, there are many steps, each contributing something to the construction of the car. Each step operates in parallel with the other steps, though on a different car. In a computer pipeline, each step in the pipeline completes a part of an instruction. Like the assembly line, different steps are completing different parts of the different instructions in parallel. Each of these steps is called a pipe stage or a pipe segment. The stages are connected one to the next to form a pipe-instructions enter at one end, progress through the stages, and exit at the other end, just as cars would in an assembly line.
Thus, as instructions are fetched, they are introduced into one end of the pipeline. They proceed through pipeline stages within a microprocessor until they complete execution. In such pipelined microprocessors it is often not known whether a branch instruction will alter program flow until it reaches a late stage in the pipeline. However, by this time, the microprocessor has already fetched other instructions and is executing them in earlier stages of the pipeline. If a branch causes a change in program flow, all of the instructions in the pipeline that followed the branch must be thrown out. In addition, the instruction specified by the target address of the branch instruction must be fetched. Throwing out the intermediate instructions, and fetching the instruction at the target address creates processing delays in such microprocessors.
To alleviate this delay problem, many pipelined microprocessors use branch prediction mechanisms in an early stage of the pipeline that predict the outcome of branch instructions, and then fetch subsequent instructions according to the branch prediction.
A popular branch prediction scheme uses a branch history table (BHT), or prediction history table (PHT), to make predictions about conditional branch instruction outcomes. One simple BHT is an array of single bits. Each bit stores the last outcome of a branch instruction. For example, the bit stores a 1 if the branch was taken the last time it was executed and a 0 if the branch was not taken the last time it was executed.
The array is indexed by the address of the branch instruction. To make a prediction for a branch instruction, a branch predictor takes the address of the branch instruction and outputs the bit from the array entry selected by the address. Thus, the prediction for a given execution of a branch instruction is the outcome of the previous execution of the branch instruction. After the branch instruction executes (i.e., once the microprocessor resolves whether the branch is taken or not) the bit indexed by the branch instruction address is updated with the actual branch instruction outcome. A branch prediction mechanism such as a branch history table is commonly referred to as a dynamic branch prediction mechanism because it keeps a history of the outcome of branch instructions as a program executes and makes predictions based upon the history.
Many computer systems today have memory address ranges on the order of gigabytes. It is not practical for the BHT to be as large as the memory space of the system in which the microprocessor operates. Common BHT sizes are 1 KB to 4 KB. Therefore, only a portion of the address branch instruction is used to index into the BHT. Typically, the lower address bits are used as the index. Consequently, sometimes two or more branch instructions will index into the same location in the BHT. This phenomenon is commonly referred to as aliasing. This phenomenon occurs similarly in caches. However, most BHT""s do not have cache tags and sets. Therefore, the outcome of the newer branch will replace the outcome of the older branch. This may be detrimental if the older branch executes next, rather than the newer branch.
The aliasing phenomenon is also referred to as PHT interference, since the outcome of one branch is interfering with the subsequent prediction of another completely unrelated branch. See Eric Spangle, Robert S. Chappell, Mitch Alsup, Yale N. Patt, xe2x80x9cThe Agree Predictor: A Mechanism for Reducing Negative Branch History Interferencexe2x80x9d, Proceedings of the 24th International Symposium on Computer Architecture, Denver, June 1997, which is hereby incorporated by reference.
Spangle defines interference as xe2x80x9ca branch accessing a PHT entry that was previously updated by a different branch.xe2x80x9d He notes that interference may be positive, negative or neutral. A positive interference is one that causes a correct prediction that would otherwise have been a misprediction. A negative interference is one that causes a misprediction that would otherwise have been a correct prediction. A neutral interference is one that does not affect the correctness of the prediction. Spangle goes on to show that negative interference has a substantial impact on branch prediction accuracy overall.
Some solutions have attempted to reduce the number of interferences. One solution is to increase the size of the PHT. However, increasing the size of the PHT increases cost significantly because it requires a substantial additional amount of hardware.
Spangle proposes a solution to the interference problem that he refers to as xe2x80x9cagree prediction.xe2x80x9d Agree prediction, rather than attempting to reduce the number of interferences, converts negative interferences to positive or neutral interferences. This is accomplished by storing different information in the PHT than the outcome of the last branch instruction.
The agree prediction scheme relies on a biasing bit. The biasing bit indicates a prediction of the outcome of the branch. However, unlike the PHT entries, the value of the biasing bit is not updated with each execution of the branch instruction. The biasing bit remains the same over the course of program execution.
With agree prediction, the bit stored in the PHT predicts whether or not the branch outcome will be correctly predicted by the biasing bit, rather than predicting the branch outcome itself. Essentially, the agree predictor predicts whether the branch outcome will xe2x80x9cagreexe2x80x9d with the biasing bit""s prediction. Thus, each time a branch is resolved, the PHT is updated with an indication of whether the biasing bit agreed with the actual outcome.
How the agree prediction converts negative interferences to positive or neutral interferences can perhaps best be illustrated by first looking at the operation of the older scheme. For example, assume two branch instructions alias to the same entry in the PHT. Also assume one branch has an 80% taken percentage and the other branch has a 30% taken percentage. The probability that the two branches will have opposite outcomes is the probability the first branch will be taken times the probability the second branch will not be taken plus the probability the first branch will not be taken times the probability the second branch will be taken. In our example the probability is:
(80%*70%)+(20%*30%)=62%.
However, in agree prediction the probability that the two branches will have opposite outcomes is a function of the prediction accuracy of the biasing bit. This probability is the probability the first branch agrees with the biasing bit times the probability the second branch disagrees with the biasing bit plus the probability the first branch disagrees with the biasing bit times the probability the second branch agrees with the biasing bit, or:
P1*(1xe2x88x92P2)+(1xe2x88x92P1)*P2,
where P1 is the prediction accuracy of the biasing bit for the first branch and P2 is the prediction accuracy of the biasing bit for the second branch.
To illustrate, if the prediction accuracy of the biasing bit is 70% for each branch, then the probability that the two branches will have opposite outcomes with agree prediction is:
(70%*30%)+(30%*70%)=42%.
Thus, it may be observed that the probability that two different branches that alias to the same history table entry will have opposite values is lower with the Agree predictor than the conventional scheme, thereby reducing the detrimental effects of negative interference.
Spangle proposed two biasing bit schemes. The first is referred to as the xe2x80x9cfirst timexe2x80x9d mechanism. The first time mechanism stores the outcome of the branch the first time it is executed and uses that outcome as the biasing bit value. For example, the biasing bit may be stored in an instruction cache or branch target buffer (BTB) of the microprocessor.
The second biasing bit scheme is referred to as the xe2x80x9cmost oftenxe2x80x9d scheme. With the most often scheme, the program is executed and statistics of the branch outcomes are gathered. After the statistics are gathered, the biasing bit is given the value of the most frequent outcome for each branch. An example of a processor that uses the most often scheme is the Hewlett-Packard(copyright) PA-8500 microprocessor. See Linley Gwennap, xe2x80x9cGshare, xe2x80x98Agreesxe2x80x99 Aid Branch Predictionxe2x80x9d, Microprocessor Report, Nov. 17, 1997. The PA-8500 relies on dedicated static branch prediction bits in the branch instruction itself which are populated based upon previous executions of the program.
However, the first time and most often schemes have important limitations. The most often scheme requires a static branch prediction bit in the instruction format. This is not helpful for microprocessor architectures that do not have a static branch prediction bit, such as the x86 architecture instruction set.
The first time scheme requires significant additional hardware, which increases the cost of the microprocessor. The additional hardware required is directly proportional to the size of the PHT. Thus, if the size of the PHT is 4K entries, then 4K additional biasing bits must be added.
U.S. patent application Ser. No. 09/203,884, now U.S. Pat. No. 6,247,122, entitled Method and Apparatus for Performing Branch Prediction Combining Static and Dynamic Predictors, having the same assignee and inventors, and hereby incorporated by reference, describes a branch prediction method that employs a static prediction based on the test type in the opcode of a conditional branch instruction specifying a condition upon which the conditional branch instruction will be taken as a biasing bit for correlation with an Agree dynamic branch predictor.
The disclosed method has the advantage of not requiring the relatively large additional amount of hardware associated with xe2x80x9cfirst timexe2x80x9d or xe2x80x9cmost oftenxe2x80x9d biasing bit schemes. However, microprocessor pipeline depths continue to increase, resulting in more severe performance degradation when branches are mispredicted. This generates a demand for even greater branch prediction accuracy.
Using the illustration above, if the prediction accuracy of the biasing bit is 80% for each branch rather than 70%, then the probability that the two branches will have opposite outcomes with agree prediction is:
(80%*20%)+(20%*80%)=32%.
This example illustrates the importance of improved biasing bit accuracy.
Therefore, what is needed is a static branch prediction mechanism that provides a more accurate biasing bit without requiring large amounts of additional hardware or dedicated biasing bits in the instruction format such as is required with first time or most often biasing bit schemes.
To address the above-detailed deficiencies, it is an object of the present invention to provide a static branch predictor that provides a biasing bit with improved static prediction accuracy for use within an Agree predictor. Accordingly, in attainment of the aforementioned object, it is a feature of the present invention to provide an Agree branch predictor within a microprocessor including a history table for storing a plurality of Agree/Disagree predictions regarding outcomes of conditional branch instructions, the Agree predictor also including a correlator, coupled to the history table, that correlates a biasing bit with an Agree/Disagree prediction generated by the history table to generate a final prediction of conditional branch instruction outcomes. The Agree predictor includes a static predictor, coupled to the correlator, that generates the biasing bit to indicate a static prediction of an outcome of a conditional branch instruction. The static predictor generates the biasing bit based upon a test type specifying a condition upon which the conditional branch instruction will be taken and based upon an opcode of an instruction preceding the conditional branch instruction.
An advantage of the present invention is that it provides an accurate biasing bit without requiring the relatively large additional amount of hardware associated with other biasing bit schemes and without requiring a dedicated biasing bit within the instruction format. Another advantage of the present invention is that it provides the accurate biasing bit within the constraints of already existing, well-established instruction formats, such as the x86 instruction format.
In another aspect, it is a feature of the present invention to provide a method for statically predicting the outcome of conditional branch instructions within a microprocessor. The method includes receiving a test type of a conditional branch instruction for specifying a condition upon which the conditional branch instruction will be taken, receiving an opcode of an instruction preceding the conditional branch instruction, and making a static prediction of an outcome of the conditional branch instruction based upon the test type and the opcode.
In yet another aspect, it is a feature of the present invention to provide a branch prediction mechanism for predicting conditional branch instruction outcomes within a microprocessor. The branch prediction mechanism includes a first input that receives an indication of a test type of a conditional branch instruction specifying a condition upon which the conditional branch instruction will be taken and a second input that receives an opcode of an instruction preceding the conditional branch instruction. The branch prediction mechanism also includes an output that indicates a prediction of whether the conditional branch instruction will be taken and prediction logic, coupled to the first and second inputs and the output, that makes the prediction on the output based on the test type and the opcode.
In yet another aspect, it is a feature of the present invention to provide an Agree branch predictor within a microprocessor including a history table for storing a plurality of Agree/Disagree predictions regarding outcomes of conditional branch instructions, the Agree predictor also including a static predictor for generating a biasing bit to indicate static predictions of conditional branch instruction outcomes, the Agree predictor also including a correlator, coupled to the history table and the static predictor, for correlating the biasing bit with an Agree/Disagree prediction generated by the history table to generate a final prediction of conditional branch instruction outcomes. The Agree predictor includes an instruction register, coupled to the static predictor, that stores an indication of the instruction opcode of an instruction preceding a conditional branch instruction. The static predictor generates the biasing bit based upon a test type specifying a condition upon which the conditional branch instruction will be taken and based upon the opcode of the instruction preceding the conditional branch instruction.