1. Field
The present disclosure relates to computer processors.
2. Related Art
Modern computer processors (also known as central processing units or CPUs) employ branch prediction and a pipelined instruction fetch process so as to be able to feed a new decoded instruction (or several, depending on the architecture) into issue every cycle. The instruction fetch pipeline can be lengthy. The penalty for misprediction of a branch operation can take many cycles as the instruction fetch pipeline needs to be flushed and filled with instructions starting from the point of the mispredicted branch operation. This penalty would be prohibitive if it was imposed on every conditional branch operation, hence the need for efficient branch prediction.
It is relatively easy to know the next instruction in advance of execution when instructions are issued in consecutive sequence. In architectures with fixed-length instructions, the location of the Nth next instruction is N instruction widths ahead of the location of the current instruction. The relation is not so direct with variable-length instructions. In practice, instruction fetch pipelines have little trouble with consecutive instructions.
Unfortunately, actual program instructions are not always consecutive. Except for certain highly specialized kinds of programs, most programs are replete with branch operations (or subroutine call operations and corresponding return operations) that transfers control away from the linear sequence of instructions that can be efficiently pipelined. After some fetching of instructions from one run of consecutive addresses, the transfer starts fetching from a different run of addresses. However, to do so without a hiccup, the instruction fetch pipeline must know where the new run will be and at which cycle in the future it starts.
That time-and-place information is in the encoding of the operations that redirect control flow and the data operands that they have as arguments. However, the ordinary flow of execution will not examine these transfer operations until they issue, and by then it is too late (by the amount of the pipeline delay) to redirect the instruction fetch pipeline to the new run. Hence, each operation that actually transfers adds a whole pipeline delay to execution.
There are several established ways to avoid this problem, but the most important is branch prediction. This approach builds on the execution behavior for a given conditional branch operation. There are innumerable branch prediction schemes, many of them subtle and complex. However, they all seek to predict what instruction will execute next after a conditional branch operation. All branch prediction schemes are vulnerable to making incorrect predictions. If the prediction is wrong (a mispredict), then the instruction fetch pipeline is full of instructions from the incorrect execution path. In this case, such instructions are discarded and the instructions from the correct execution path are fetched and decoded after a full pipeline delay. If, for example, the branch prediction scheme is correct 90% of the time and conditional branch operations are 20% of the total (a common figure for general code), then there will be one mispredict on average every 50 instructions. If, for example, the pipeline delay is 30 cycles, then nearly 40% of the CPU cycles are wasted recovering from mispredictions.
The various branch prediction strategies commonly used by or proposed for CPUs employ hardware-based tables that retain the history of the CPU's prior experience with branch operations. Exactly what this history information comprises, and how it is used to decide on a prediction, varies by strategy. A conventional predictor keeps a table entry for each branch operation that it may execute. Commonly the information is organized as a hardware-based hash table indexed by the memory address of the branch instruction. The instruction pipeline, on recognizing a branch operation, hashes the address of the branch operation to generate a table index and then looks for a table entry that predicts whether the branch operation will be taken or not. For dynamic branch operations (ones where the target is computed at run time) the tables may also contain the target address of the branch operation, or at least what its target was the last time it was executed.
All of the branch prediction strategies benefit from keeping more information, either to have more history about a particular branch instruction, or to be able to keep history about more branch instructions. Unfortunately, increasing the size of the hardware-based hash table costs in die area and execution delays.
Branch prediction schemes also typically predict flow control operations via conditional call and return operations. Programs are commonly divided into separate units of program code referred to as subroutines or functions or procedures. The unit of program code can be activated and executed to perform its behavior by a programmatic device known as a call operation. The call operation identifies the unit of program code that is to be activated, and then pauses the currently running unit of program code until the execution of the called unit of program code is complete by the execution of a return operation that returns the control flow of the program to the point of call operation. Then the portion of the program code that made the call operation resumes its execution at the point of the call operation. The execution of the called unit of program code can include call operations (nested calls). A unit of code that does not call any other unit of code is said to be a leaf function. The unit of code that is the beginning of the whole program, implicitly called by the operating system, is the root function or main. Branch operations, call operations and return operations can be unconditional in nature where the transfer of control is not dependent on any conditions. Thus the transfer by the operation always happens. Branch operations, call operations and return operations can be conditional in nature where the transfer of control is dependent on a condition (predicate). If the condition is evaluated to be true, then the transfer by the operation happens. If the condition is evaluated to be false, then the transfer by the operation does not happen.