The speed of microprocessors has been increased dramatically over recent years. One reason for this improvement in speed is that microprocessors have become more deeply "pipelined." Pipelining refers to the division of labor by a microprocessor that allows it to operate much like an assembly line. For example, the popular Pentium.RTM. processor from Intel.RTM. divides its workload into five stages. As shown in FIG. 1, the Pentium.RTM. first performs a prefetch of an instruction in memory. While that first instruction, A, is passed to the first stage of decoding, the microprocessor prefetches the next instruction, B. Then, while instruction A is in the second decoding stage and instruction B is in the first decoding stage, the processor prefetches instruction C. This continues until all five stages of the processor are loaded, at which time the processor is essentially executing five different instructions at once. Obviously, pipelining instructions in this manner provides great benefit to overall system speed.
One factor that can severely hamper the performance of deeply pipelined processors, however, is the presence of branches in the computer code being executed by the processor. Generally, there are two types of branches: direct and indirect. A direct branch (such as an if-then-else statement) conditions the flow of execution control in a program on the value of a particular variable. Depending on the value of the variable, the execution flow of control will fall through to the next instruction in the sequence stored in memory, or it will "take the branch." If the branch is "Taken," the execution flow of control will jump to an instruction at an out-of-sequence address.
As will be appreciated, direct branches cause problems for deeply pipelined processors because instructions are not always executed in the order in which they are stored in memory. For example, with reference to FIG. 1, assume that instruction A is a direct branch instruction and that, depending on the value of some variable, execution flow of control will either fall through to instruction B, or, if the branch is taken, jump to instruction G. The processor is not able to determine until late in the pipelining of instruction A (i.e., execution and write back of instruction A) whether the branch to instruction G will be taken. By that time, as shown, instructions B-F are already in the pipeline. Therefore, all of the information in the pipeline must be "flushed," and restarted with the prefetch of instruction G. Such pipeline flushes significantly degrade processor performance. Alternatively, some processors are designed to "stall" during execution of direct branch instruction A until it can be determined whether the branch is taken. During a stall, no further instructions enter the pipeline, which can also have significant negative effect on processor speed.
Programmers and hardware engineers have attempted to address the problems caused by direct branches by devising direct branch "prediction" schemes. These schemes are sometimes accomplished by a compiler. A compiler is a computer program that reads a program written in one language (the source language) and translates it into an intermediate code, which it then optimizes and assembles into an object code. The object code is then linked by a linker to create an executable object code that is readable by a computer. Source code is generally written in languages that are humanly readable, such as FORTRAN, C, and PERL. Object code is generally comprised of assembly language or machine language for a target machine, such as an Intel microprocessor-based computer.
Modem compilers are designed to optimize source code as it is translated into object code. One method of optimization, is through direct branch prediction, whereby the optimizing compiler attempts to predict whether each branch in the computer code is Taken Or Not Taken. Branch prediction in the compiler is accomplished using one or more heuristics, which can be either profile-based or rule-based. FIG. 2A illustrates a prior art profile-based branch prediction method. Source code 10 is first compiled 20. During this compilation 20, in addition to translating the source code 10 into an intermediate code, the compiler "instruments" the intermediate code to collect profile data on all of the direct branches in the code. "Instrumenting" refers to the practice of adding code to trace the performance of direct branches during execution. The intermediate code is then assembled into an object code and linked 30 to create an instrumented executable object code 40. The instrumented executable object code 40 is then executed 50 using a representative workload 60. During execution, the performance of the direct branches in the code is traced and analyzed 70. That profile information is then fed back to the compiler, which predicts whether each direct branch in the code is Taken Or Not Taken and inserts those predictions into the code. Once the code is again linked 30, it results in a direct-branch optimized executable object code 80.
Alternatively, as shown in FIG. 2B, the direct branches can be predicted using rule-based heuristic(s). Here, the source code 90 is first compiled using 100 rule-based direct branch heuristic(s). A rule-based heuristic is a static rule or assumption. For example, a simple rule-based heuristic in this context is that branches are always Taken. A variety of other rule-based heuristics can be employed alone or in combination, as explained in U.S. Pat. No. 5,655,122, issued to 3Youfeng Wu on Aug. 5, 1997, which is hereby incorporated by reference. After the source code is compiled 100, it is linked 120 to create a direct-branch optimized executable object code 130.
It will be appreciated that the correct prediction of whether a direct branch is Taken Or Not Taken can greatly increase the speed of deeply pipelined processors. In the example above with respect to FIG. 1, if it is correctly predicted that the branch from Instruction A to Instruction G will take place, the processor will begin fetching Instruction G directly after Instruction A, thereby avoiding a processor flush or stall. Of course, if the prediction is incorrect, processor flushes are still likely.
In addition, even when a branch is correctly predicted Taken, fetching at the branch target address cannot begin immediately because the branch target address must be calculated. Branch Taken/Not Taken predictions are typically inserted as part of the direct branch itself--not ahead of the direct branch. Because a branch target address is, by definition, not the next sequential address in memory, the processor must add or subtract to the current program counter to calculate the branch address when the branch is predicted Taken. This causes the processor pipeline to stall during calculation even upon a correct prediction of a direct branch Taken.
Considerably less attention has been paid to processor stalls or flushes caused by indirect branches. Indirect branches differ from direct branches in that they are always "Taken." A typical indirect branch in a source language such as C reads as follows:
Source Code: Switch (x) [ case A: &lt;code for target A&gt; case B: &lt;code for target B&gt; case C: &lt;code for target C&gt; ]
Through this indirect branch, execution flow of control is switched according to the value of x to one of the target addresses A, B, or C. The indirect branch is always "Taken" in the sense that execution flow of control will always switch to one of the target addresses A, B, or C--none of which are necessarily the next target address stored in memory. Therefore, the direct branch hinting mechanisms of the prior art, which predict only whether a branch is Taken, are inapplicable to indirect branches. Indirect branches, however, can still degrade processing speed by causing the processor pipeline to stall while the indirect branch variable (variable x above) is evaluated and the address of the correct target is calculated.
There have been some attempts to remedy the problems caused by indirect branches through both hardware and software. A typical hardware solution is to provide a cache memory that stores the last target address used for a particular indirect branch. When the indirect branch is encountered, the processor begins to fetch from the predicted target address stored in the cache memory while the indirect branch variable is being evaluated. If the indirect branch switches to the same target address as the last time it was executed, processing speed is increased in that calculation of the target address is unnecessary and pipeline stalls are avoided. This method of indirect branch target address hinting, however, is often extremely inaccurate, especially where the indirect branch does not tend to switch to the same target consecutively (which has been found to be the case for many indirect branches). In addition, if the cache memory is not large enough, often the last address used for a particular indirect branch is forced out of the cache memory before that indirect branch is encountered again. Indeed, this method of indirect branch target address prediction is often more detrimental than helpful and can result in processing speeds that are lower than if no prediction mechanism were used at all.
Software schemes to minimize the effect of indirect branches are often employed in optimizing compilers but typically involve restructuring the code to avoid executing indirect branches. For example, cascaded if-then-else constructs are often substituted for indirect branches. In other words, indirect branches are transformed into a series of direct branches. This approach can be effective where an indirect branch is heavily biased in terms of flow of control because the if-then-else statements can be cascaded in such a way that direct branches are never Taken. However, where flow of control is more evenly balanced in an indirect branch and whenever the most-likely target address is not actually Taken, this software solution can adversely impact execution time and will almost always increase compilation time.
What is needed is a method and apparatus for accurately hinting the target address for indirect branches.
What is needed is a method and apparatus for hinting the target addresses of indirect branches that avoids microprocessor stalls and flushes.
What is needed is a method and apparatus for hinting the target addresses of indirect branches that is not limited by hardware constraints.
What is needed is a method and apparatus for hinting the target addresses of indirect branches that can be accomplished without necessarily converting the indirect branches into direct branch constructs.