This invention relates generally to computer systems, and in particular to a programmable branch prediction system and method for computer systems.
In high performance processors, it is common practice to decompose an instruction into several steps, such as a fetch step, a decode step, and an execute step, and to perform each step by a different instruction processing sub-unit. These instruction processing units may operate asynchronously and do not have to be processing the same instruction. If each instructional processing unit usually is not processing the same instruction, this increases the speed of the processor. Thus, it is common practice to overlap successive instructions by one clock cycle so that as a fetch unit begins processing a second instruction, a decode unit may be processing the first instruction. On the next clock cycle, the fetch unit may be processing a third instruction while the decode unit is processing the second instruction, and the execute unit may be processing the first instruction. In a normal system, each instruction processing unit operates synchronously so that each instruction requires three clock cycles to execute. Thus, it takes twelve clock cycles to execute four instructions in a non-overlapped system. By contrast, with instruction overlap, those four instructions may be executed in only six clock cycles. This overlap increases the processing speed of the processor significantly. Similarly, the instructions being executed may be overlapped in such a way so that the decode step for two or more instructions may be done at the same time. In this overlapping system, multiple instructions may be simultaneously processed.
This instruction overlap, however, may be unavailable or inefficient, principally because of the frequent occurrence of various types of branch instructions in most programs. A branch instruction may completely eliminate any of the benefit of the instruction overlap, especially if the branch occurs and new code must be loaded into the pipeline and then executed.
Branch instructions have a significant attribute that may reduce or eliminate the efficiency of the instruction overlap. The branch may or may not be taken, which introduces a temporary uncertainty as to which instruction is next and prevents any instruction overlap because the next instruction is not known.
The problems for instruction pipelines created by branch instructions may be reduced by providing a branch instruction prediction system which predicts, prior to actual execution of the branch, whether or not the branch will be taken, the next instructions, address, or other reference to the destination of the branch, executed if the branch is taken, and the next instructions or reference to the destination of the branch, such as the address, executed if the branch is not taken. A successful branch prediction permits the processor to function without the delay in processing time caused by a branch. However, there may be a large time penalty if the prediction is incorrect; and the misprediction penalty may be greater than the delay due to the uncertainty of the branch. Therefore, high prediction accuracy in a branch instruction prediction system is desirable.
Most conventional branch instruction prediction systems are automatic branch instruction predictors which predict the outcome of a branch by reacting either to the predicted instruction sequence, to the past branch instruction behavior, or to the storing of program operands or instructions.
Conventional automatic branch instruction prediction systems may be integrated with a conventional memory hierarchy or maintained separate from the memory hierarchy. For example, one conventional system integrates an automatic branch instruction prediction system into an instruction cache. This integration permits the system to serve as an instruction cache as well as a branch predictor, which is beneficial, since both systems must be operating on the same instructions and may share memory or other resources. Other conventional systems improved on the basic integrating concept by having systems that automatically reorganize the executing program into traces containing instructions from non-sequential address ranges and store these traces together with prediction and recovery information in the cache.
To achieve higher accuracy predictions of branch behavior, which may be whether a branch is taken (the predicted outcome) and/or the address or reference to the destination of the branch that the branch goes to after a successful prediction of the outcome (predicted result), there are a number of conventional branch prediction systems that attempt to accurately predict the outcome and the result of a branch. These conventional systems have varying degrees of accuracy, as described below. Some accurately predict 60% of the branch outcomes and results while the most accurate systems may have an accuracy of 90%. These conventional branch prediction systems may be grouped into several different categories including static branch predictors, dynamic branch predictors, implicitly programmable branch predictors, and explicitly programmable branch predictors. Each of these categories has different advantages and disadvantages and different prediction accuracies.
In a static branch instruction prediction system, branches are predicted based on static, unchanging information. For example, one conventional static automatic branch instruction prediction system identifies branches in a predicted instruction sequence and then always predicts that each branch will be taken. This branch prediction system has an accuracy of about 60 percent (i.e., it guesses correct about 60 percent of the time). A more accurate static branch prediction system predicts that a branch is taken if the destination address of the branch is at a numerically lower memory location than the branch instruction itself. Another conventional static branch prediction system uses prediction information that is encoded into the branch instruction itself at the time that the branch instruction is compiled. All of these systems predict that a given branch will behave according to static prediction information. However, if the branch behavior changes at any time after the static branch prediction information is generated, then the static branch prediction system may mispredict the branch behavior. For example, if the behavior of a branch instruction is dependent on another variable which is undefined at the time of compiling, then the behavior of the branch may change after the static prediction information is generated. The prediction decision for each given branch is fixed, so the accuracy of these static branch prediction systems is limited. To increase the accuracy, dynamic branch prediction systems may be used.
In a dynamic branch prediction system, the outcome of a branch is predicted based on dynamic information, such as past branch behavior, that may change or be modified during the execution of the program. For example, one conventional dynamic branch prediction system uses a saturating counter, updated by prior branch taken/not-taken decisions, to predict that a branch will be taken if prior branch decisions indicate that the branch was recently taken more often than not. Otherwise, the dynamic system will predict that the branch will not be taken. After the branch is executed, the branch prediction information may be updated.
A variety of dynamic branch prediction systems have been proposed that use past branch instruction behavior to predict future behavior of the branch instruction. These dynamic branch prediction systems, however, still have limited accuracy because these systems use past program data and branch behavior to predict future branch behavior. Past program behavior may not accurately predict future branch behavior.
Another conventional dynamic branch prediction system uses special branch instructions for certain branches with predictable behavior, such as procedure returns and loops. These special branch instructions have an agreed upon usage of operands and this usage may be transformed into accurate predictions of these certain branches. These systems, however, do not accurately predict other branches. Some of these dynamic systems described above predict the outcome of the branch (i.e., whether or not the branch is going to be taken), but do not predict the destination address of the branch if the branch is taken. The accuracy and effectiveness of a branch predictor may be increased by accurately predicting both the outcome of a branch and the result of a branch.
When a prediction of whether a branch will be taken is made, the next step is to determine what will be the reference to the destination of the branch, such as the address. If the branch is not taken, then the next destination of the branch will be the next address after the branch so that the destination of the branch may be easily predicted. If the branch is predicted to be taken, however, it can be more difficult to determine what will be the next address. One conventional destination address prediction system predicts the destination of procedural returns by using a stack mechanism. In this system, when the program branches to a subroutine, for example, the address of the last instruction executed before the subroutine may be stored in the stack. When the subroutine is complete, the address stored in the stack may be used to determine the next address. This stack, however, cannot predict computer determined destinations, such as a calling address for C++ virtual functions, since the calling address of the virtual function is not statically fixed or produced by a previous branch. Thus, these systems do not operate effectively for all types of branches.
Another conventional destination address prediction system uses a branch history table that contains the past destination addresses for various branches. Once again, this system may predict procedural returns, but cannot predict computer generated destinations since past destinations may not be accurate future destinations because the destination of a branch may change. A third branch address prediction system uses dedicated return address registers to predict branch destination addresses by observing the stored values in the dedicated registers. These dedicated register systems require special hardware and provisions for these registers in the instruction set that are not always available so this system has limited utility. A fourth branch address prediction system monitors all stored operands to determine future branch addresses, but this system is only as good as the information being used which only includes past program behavior, and the limited amount of analysis that can be performed at run-time. These branch address prediction systems have about the same accuracy as some of the branch outcome predictors. To further increase the accuracy of any of these above systems, a programmable branch prediction system may be used.
A programmable branch prediction system provides some form of programmable control over the operation of any of the automatic branch prediction systems described above. In these programmable systems, the automatic branch prediction system does the actual branch predictions and the programmable system only interferes with the predictor to adjust prediction information that may be incorrect or to incorporate information derived from examining the program state. These programmable systems, as a result, are transparent to and compatible with all automatic branch prediction systems since these systems merely exert control over the prediction system. These programmable branch prediction systems may use implicit control or explicit control, as described below.
Implicitly programmable branch prediction systems do not directly affect the branch prediction information. For example, programmable system that use implicit control may choose between alternate code fragments which implement the same function in different ways. The implicit programmable system may choose the code fragment that is most likely to be accurately predicted by the branch prediction system in a particular execution context. These implicit systems, however, require more storage space since multiple code fragments must be stored, and may not be useful for more complex branches. In addition, since some programs are executed from a read only memory (ROM), this system cannot be used for these programs because alternative code fragments may not be written into a read only memory. There are also implicit programmable systems that use prediction information encoded within a branch instruction. These systems, however, suffer when it becomes necessary to change the information encoded within the instruction. All of these system are implicit because the programmer does not actually directly control the branch prediction system, but does control the branches given to the branch prediction system. Explicit programmable branch prediction may be more accurate and require less memory space.
Explicitly programmable systems directly control the branch prediction systems by using prediction operations, instructions or instruction sequences. These instruction are added into the program code. These explicit systems are more flexible because the added instructions may be added into any instruction set, and the prediction op.sub.-- codes chosen such that older processor designs treat them as no.sub.-- op instructions so that these older systems may still use the program code even with the additional branch prediction operations. These explicit system usually control whether the underlying branch prediction system is going to be used at a particular time, whether the prediction information will be updated and bulk initialization of the prediction information. These explicit programmable systems, however, affect all predictions done by a branch predictor unless the branch predictor is activated and deactivated in turn.
None of these systems described above provides a programmable branch prediction system that can accurately and reliably predict branch outcomes and destination addresses for any branch. In addition, none of these conventional systems provides explicit control over the prediction of each individual branch.
There is a need for a programmable branch prediction system and method which avoid these and other problems of known devices, and it is to this end that the present invention is directed.