Modern computer systems have incorporated many advanced techniques in order to improve the speed of program execution. In a pipelined processor, portions of the execution of a single machine instruction are separated and performed in separate pipeline stages. For example, the instruction fetch, instruction decode, data fetch, computation, and result storage portions of an instruction could be performed in five separate pipeline stages, in assembly-line fashion. In the idealized case, each pipeline stage can perform its portion in one cycle of the system clock, and a new instruction can be initiated at each clock cycle, so while each instruction may take, for example, five clock cycles to execute, an instruction is completed every clock cycle. This technique can result in a significant performance improvement over a computer that must finish execution one instruction before starting execution of the next, and a modern pipelined computer system may approach the ideal of executing one instruction per clock cycle.
Pipelining is a simple form of parallelism, as several instructions can be in the process of execution concurrently, albeit at different stages. The efficiency of a pipelined processor is hindered if one instruction depends on the result of an immediately previous instruction. For example, if the result of one instruction is an operand for the next instruction, execution of the next instruction may need to be delayed or suspended until the previous result becomes available. This condition is called a pipeline stall. Some pipeline stalls can be avoided by careful programming that separates instructions whose operands may be dependent. Modern compilers often reorder instructions, when possible, for this purpose.
Conditional branch instructions are a particularly troublesome cause of pipeline stalls. Because the program control flow depends on the outcome of the condition test in a conditional branch instruction, the location of the next instruction is not known until the branch instruction is complete or nearly so. Program execution must wait until the condition test result is known and the next instruction can be located.
One way processors try to mitigate the inefficiencies of pipeline stalls resulting from conditional branch instructions is to perform speculative execution. For example, the processor may temporarily disregard the fact that the branch instruction may direct the program flow elsewhere, simply fetch the next instruction following the branch, and begin executing that next instruction while the condition is being evaluated. If the branch is taken, sending the program flow elsewhere, then any work done in executing the fetched next instruction is discarded and the correct instruction, from the program location at the destination of the branch, is fetched and issued. This is called the “predict not taken” strategy. If the branch is not taken and the fetched instruction is the correct one, the processor has avoided a pipeline stall. If the branch is taken and the speculative work on the fetched instruction must be discarded, little or no time is lost because without the speculative execution, the pipeline would have been stalled anyway. Other branch prediction strategies are possible as well. For example, a processor could use a “predict taken” strategy and predict all branches as taken rather than not taken, as in the example above.
Clearly, the more accurately the processor is able to predict the outcome of a conditional branch, the more often it will be able to fetch the correct instruction, and the more often its speculative work on that next instruction will pay off and not have to be discarded. Accordingly, several strategies exist for improving the accuracy of branch prediction. In some processors, the branch instruction itself contains a flag that indicates whether to predict the branch as taken or not taken. The flag is set by the compiler, which chooses the instruction form based on a software algorithm. For example, the compiler may assume that a conditional branch at the end of a short “do loop” will usually be taken, and select the instruction form accordingly. Because the compiler is software, it can use an algorithm of considerable complexity for branch prediction. The compiler may even generate a confidence estimate for its predictions. For example, a branch that the compiler estimates will be almost always taken might be predicted as “taken”, while a branch that the compiler estimates will be only usually taken might be classified as “weakly taken”. Other branches might be classified as “not taken”, “weakly not taken”, or by other designations.
The processor may perform branch prediction in hardware. For example, the processor may maintain a table indicating whether each branch was taken or not taken the most recent time it was encountered, if ever, and predict the next occurrence of the branch to behave the same way. Past behavior of branches in programs is often a good predictor of their future behavior. More sophisticated schemes exist as well. For example, the processor may predict each branch to go to the same destination as it went previously, but only if it went the previous direction two consecutive times. Hardware prediction schemes such as these require that the processor maintain a table including the location of each branch, its prior behavior, and a predicted branch destination. This table may be quite large, adding complexity to the processor and consuming significant power.
Hardware branch prediction may also assign confidence to branch predictions. For example, the processor may maintain a counter for each branch, incrementing the counter each time the branch is taken and decrementing the counter each time the branch is not taken. The counter value may then be used as a branch predictor. The higher the counter value, the more often the branch has been taken, and therefore the more confidently the branch can be predicted to be taken, and conversely, the lower the value of the counter the more confidently the branch can be predicted as not taken. Depending on the size of the counter, many levels of confidence may be possible for each branch. Of course, this additional record keeping circuitry may contribute to a further increase in power consumption of the processor.
There is an incentive to construct ever more accurate, and presumably ever more complex branch prediction schemes. There is also an incentive to perform as much speculative execution as possible, so that more pipeline stalls can be avoided and computer performance is improved. However, any execution of instructions by the processor consumes power. Speculative execution that turns out to be wasted may not waste significant computing time, but wastes the energy expended in the computation, because the results are discarded.
While the performance enhancement offered by speculative execution is desirable in nearly any computer, the additional power consumption it entails is often a drawback, particularly in a portable computer. A portable computer is typically one designed to operate on a limited source of power, such as one or more batteries. It is highly desirable for a portable computer to operate as long as possible under battery power before the batteries are replaced or recharged.