The present invention relates generally to prediction, and, more particularly, to predictive decoding.
The use of prediction techniques is advantageous in the implementation of microprocessors, as they improve system performance.
A state-of-the-art microprocessor can comprise, for example, an instruction cache for storing instructions, one or more execution units for executing sequential instructions, a branch unit for executing branch instructions, instruction sequencing logic for routing instructions to the various execution units, and registers for storing operands and result data.
An application program for execution on a microprocessor includes a structured series of macro instructions that are stored in sequential locations in memory. A current instruction pointer within the microprocessor points to the address of the instruction currently being executed, and a next instruction pointer within the microprocessor points to the address of the next instruction for execution. During each clock cycle, the length of the current instruction is added to the contents of the current instruction pointer to form a pointer to a next sequential instruction in memory. The pointer to the next sequential instruction is provided to logic that updates the next instruction pointer. If the logic determines that the next sequential instruction is indeed required for execution, then the next instruction pointer is updated with the pointer to the next sequential instruction in memory. Thus, macro instructions are fetched from memory in sequence for execution by the microprocessor.
Obviously, because a microprocessor is designed to execute instructions from memory in the sequence they are stored, it follows that a program configured to execute macro instructions sequentially from memory is one which will run efficiently on the microprocessor. For this reason, most application programs are designed to minimize the number of instances where macro instructions are executed out of sequence. These out-of-sequence instances are known as jumps or branches.
A program branch presents a problem because most conventional microprocessors do not simply execute one instruction at a time. Modern microprocessors typically implement a number of pipeline stages, each stage performing a specific function. Instructions, inputs, and results from one stage to the next are passed in synchronization with a pipeline clock. Hence, several instructions may be executing in different stages of the microprocessor pipeline within the same clock cycle. As a result, when logic within a given stage determines that a program branch is to occur, then previous stages of the pipeline, that is, stages that are executing instructions following in sequence, must be cast out to begin execution of sequential macro instructions beginning with the instruction directed by the branch, or the branch target instruction. This casting out of previous pipeline stages is known as flushing and refilling the pipeline.
Branch instructions executed by the branch unit of the processor can be classified as either conditional or unconditional branch instructions. Unconditional branch instructions are branch instructions that change the flow of program execution from a sequential execution path to a specified target execution path and which do not depend upon a condition supplied by the occurrence of an event. Thus, the branch in program flow specified by an unconditional branch instruction is always taken. In contrast, conditional branch instructions are branch instructions for which the indicated branch in program flow may or may not be taken, depending upon a condition within the processor, for example, the state of a specified condition register bit or the value of a counter.
A conditional branch is a branch that may or may not occur, depending upon an evaluation of some specified condition. This evaluation is typically performed in later stages of the microprocessor pipeline. To preclude wasting many clock cycles associated with flushing and refilling the pipeline, present day microprocessors also provide logic in an early pipeline stage that predicts whether a conditional branch will occur or not. If it is predicted that a conditional branch will occur, then only those instructions prior to the early pipeline stage must be flushed, including those in the instruction buffer. Even so, this is a drastic improvement, as correctly predicted branches are executed in roughly two clock cycles. However, an incorrect prediction takes many more cycles to execute than if no branch prediction mechanism had been provided in the first place. The accuracy of branch predictions in a pipeline processor therefore significantly impacts processor performance.
Yet, present day branch prediction techniques chiefly predict the outcome of a given conditional branch instruction in an application program based upon outcomes obtained when the conditional branch instruction was previously executed within the same instance of the application program. Historical branch prediction, or dynamic branch prediction, is somewhat effective because conditional branch instructions tend to exhibit repetitive outcome patterns when executed within an application program. In addition, the BPU permits execution to continue while a branch instruction is pending.
The historical outcome data is stored in a branch history table that is accessed using the address of a conditional branch instruction (a unique identifier for the instruction). A corresponding entry in the branch history table contains the historical outcome data associated with the conditional branch instruction. A dynamic prediction of the outcome of the conditional branch instruction is made based upon the contents of the corresponding entry in the branch history table.
Yet, because most present day microprocessors have address ranges on the order of gigabytes, it is not practical for a branch history table to be as large as the microprocessor's address range. Because of this, smaller branch history tables are provided, on the order of kilobytes, and only low order bits of a conditional branch instruction's address are used as an index into the table. This presents another problem. Because low order address bits are used to index the branch history table, two or more conditional branch instructions can index the same entry. This is known as aliasing or referencing a synonym address. As such, the outcome of a more recently executed conditional branch instruction will replace the outcome of a formerly executed conditional branch instruction that is aliased to the same table entry. If the former conditional branch instruction is encountered again, its historical outcome information is unavailable to be used for a dynamic prediction.
Because dynamic predictions are sometimes not available, an alternative prediction is made for the outcome of a conditional branch instruction, usually based solely upon some static attribute of the instruction, such as the relative direction of a branch target instruction as compared to the address of the conditional branch instruction. This alternative prediction is called a static prediction because it is not based upon a changing execution environment within an application program. The static branch prediction is most often used as a fallback in lieu of a dynamic prediction. Hence, when a dynamic prediction is unavailable, the static prediction is used.
As described above, prediction techniques can cover a wide range. On one end of the spectrum are simple static prediction techniques, such as cases where “overflow is usually not present” or “the usual case does not raise an exception”. More advanced predictions include some basic properties, such as “backwards branches and function returns are usually taken branches”.
To improve predictive accuracy, advanced dynamic predictors have been developed, including but not limited to, one bit predictors, bimodal predictors, gshare predictor, gskew predictors, and tournament predictors. Such advanced predictors are usually employed in conjunction with branch prediction.
While predictive techniques have been successfully applied to branch prediction, other instruction types have thus far not benefited from the use of such advanced predictors. There is thus a need for efficiently and accurately predicting the execution behavior of different types of instructions and exploiting such predictions to improve instruction execution performance.
Unfortunately, the cost of implementing such predictors is high, so few facilities, other than branch prediction, can recoup the costs in terms of area, performance, and power. There is therefore a need for sharing predictors when predictors can be profitably used but their cost exceeds the benefits of a single application.