1. Field of the Invention
The present invention relates in general to computer architecture, and in particular to a method and system to reduce the branch misprediction penalty, a problem in pipelined computer architectures.
2. Description of the Related Art
As microprocessor throughput rate is the overriding factor dictating overall system performance, designers use numerous techniques to increase microprocessor throughput. Microprocessor performance can be described as the number of computations per second or instructions executed per microprocessor clock cycle. To improve the instructions per clock (IPC), as well as increasing the clock speed, a common approach is to utilize microprocessor pipelining. Pipelining breaks execution of an instruction into several stages, all of which can be run in parallel. Simply put, pipelining is a technique whereby the next cycle function execution is started before the current cycle function execution is completed.
Using pipelined processing, instruction processing is broken up into several stages, sometimes called xe2x80x9cpipestages.xe2x80x9d A processor can start the execution of parts of a complex instruction in an early pipeline stage before the preceding instruction has been completed in the last pipeline stage. To facilitate calculations in parallel, pipelined processors execute instructions along predicted paths, and then validate the data resulting from a predicted path.
An out-of-order pipelined processor may perform several relevant pipestages out of order. Examples of such pipestages include: fetching of the instruction from the memory hierarchy to the processor; decoding opcodes, architectural sources and destinations; allocating physical destination registers; identifying physical sources; determining which instruction are to be executed within the next instruction cycle; executing instructions; and retiring registers and memory values.
Instructions are executed out-of-order and their results are temporarily stored. Instructions are retired when all previous instructions have retired and the instruction itself is executed and, if needed, verified. An instruction may be executed, but not retired if, for example, it found to be in wrong path and has to be discarded.
Instructions are executed along a speculated path based on branch prediction. This prediction is checked when the branch direction is verified. In case of misprediction, the wrong path should be discarded and instruction should be executed from the correct path.
In pipelined processors, data can be loaded into the register file when the execution stages have successfully completed the execution stage generating the data. This pipestage, where the branch outcome is resolved, is called the branch resolution stage. Data is typically retired when it has undergone xe2x80x9cde-speculationxe2x80x9d where the data is validated as accurate by the pipeline control logic. In a pipelined processor, each execution pipestage is implemented in an execution unit, with the first of such execution units receiving instructions and data from the register file. To return data results after execution in a traditional pipelined processor, each execution unit has a separate return bus coupled to the register file or some cases execution units can share an arbitrated return bus. In the traditional pipelined processor, each execution unit accomplishes de-speculation so that data can be directly retired to the register file.
Conditional branches present problems for pipelined processors. Since the conditional branch status is computed at the execute pipestage, there would be a bubble in the pipelined processor if the conditional branch is stalled until it is resolved. To reduce this penalty, speculative processors attempt to predict the direction of the conditional branch, and fetch the subsequent instructions according to the prediction. In case of a processor misprediction, all the xe2x80x9cbogusxe2x80x9d instructions are flushed. When a microprocessor speculatively executes instruction along a mispredicted path, the processor must 1) recover from the mispredicted path; and, 2) restart by executing the instructions from the correct branch target. The time lost, calculated by the number of cycles, is called the xe2x80x9cmisdirection penalty.xe2x80x9d Two major factors increase the misdirection penalty. First, super-pipelining techniques increase the distance (in terms of pipestages and cycles) of the branch resolution stage. When the distance between branch resolution stages is increased, more instructions are incorrectly fetched before the misprediction is detected. These mispredicted instructions are flushed and new instructions are fetched, increasing the misprediction penalty. Secondly, increasing the speculation degree, accomplished by expanding the instruction window fetched by the processor, results in a higher misdirection penalty, since less valid instructions are available as candidates for execution. These mispredicted instructions are also flushed, increasing the misprediction penalty.
Thus, what is needed is a fast method and system to recover from branch misdirection that can be used in microprocessors utilizing super-pipelining and increased speculation degree techniques.
The fast branch misprediction recovery process and system detects a branch target within an instruction window. Once the branch target is detected, an instruction dependency chain of the branch target can be recovered within the instruction window. Bogus instructions are flushed from the instruction window. The remaining instructions in the instruction window are executed.