I. Field of the Disclosure
The technology of the disclosure relates generally to instruction pipelining in processors and more particularly to handling of hazards (e.g., branch mispredictions) in instruction pipelines when the next instruction cannot be executed.
II. Background
Instruction pipelining is a processing technique whereby the throughput of computer instructions being executed by a processor may be increased. In this regard, the handling of each instruction is split into a series of steps as opposed to each instruction being processed sequentially and fully executed before processing a next instruction. These steps are executed in an instruction pipeline composed of multiple stages. There are several cycles between the time an instruction is fetched from memory until the time the instruction is actually executed as the instruction flows through various pipeline stages of an instruction pipeline.
Optimal processor performance may be achieved if all stages in an execution pipeline are able to process instructions concurrently and sequentially as the instructions are ordered in the instruction pipeline(s). However, structural hazards can occur in an instruction pipeline where the next instruction cannot be executed without leading to incorrect computation results. For example, a control hazard may occur as a result of execution of a control flow instruction that causes a precise interrupt in the processor. One example of a control flow instruction that can cause a control hazard is a conditional branch instruction. In this regard, a branch direction prediction circuit can be provided in a processor to speculatively predict the outcome target address of a fetched conditional branch instruction. The processor can then speculatively fetch subsequent instructions in the fetch stages of an instruction pipeline following the fetch of a conditional branch instruction based on the prediction of a target address.
When the control flow instruction finally reaches the execution stage of the instruction pipeline and is executed, the outcome target address of the control flow instruction is verified by comparing it with the previously predicted target address when the control flow instruction was fetched. If the predicted and actual target addresses match, meaning a correct prediction, delay is not incurred in instruction execution, because the subsequent instructions at the target address will have been correctly fetched and already be present in the instruction pipeline when the conditional branch instruction reaches an execution stage of the instruction pipeline. However, if the predicted and actual target addresses do not match, a mispredicted branch hazard occurs in the instruction pipeline that causes a precise interrupt. As a result, the instruction pipeline is flushed and the instruction pipeline fetch unit is redirected to fetch new instructions starting from the target address, resulting in delay and reduced performance. Also, stages in the execution pipeline may remain dormant until the newly fetched instructions make their way through the instruction pipeline to the execution stage, thereby reducing performance.
While it may be desired to provide larger instruction pipelines in processors to allow for increased frequency scaling and performance as a result, the performance penalties incurred from structural hazards occurring in an instruction pipeline generally increase with the size of the pipeline. Generally the deeper the instruction pipeline, the longer it takes for an instruction to reach an execution stage for the structural hazard to be discovered. Also, a larger number of new instructions may need to be fetched after incurring the hazard because of the larger instruction pipeline size. Several solutions have been proposed to this problem. One such solution involves multi-path execute where multiple paths following a control flow instruction are fetched. However, multi-path execute is complicated due to the large number of possible execution paths that can occur during the time that a branch is outstanding. As subsequent branch instructions are encountered, each will incur another possibility of alternative execution paths, resulting in a tree of possible execution paths stemming from the original branch. The performance cost of fetching and buffering all of these paths in parallel is expensive.
Thus, it is desired to minimize the redirection penalty incurred with precise interrupts in a processor to minimize the effect on performance.