The present invention relates to data processing systems. Specifically, the present application describes a method for improving pipeline processing to avoid execution delays due to changes in execution sequence.
Pipeline processing has been successfully employed in microprocessor design. Pipeline architecture breaks the execution of instructions into a number of pipelines for executing different types of instructions. Each stage of a pipeline corresponds to one step in the execution of an instruction making it possible to increase the speed of execution. Utilizing pipeline processing, multiple instructions can be broken down into individual stages which are executed in parallel during the same clock cycle. As opposed to serial processing, where all stages complete the processing of one instruction before beginning the processing of the next instruction, pipeline processor architecture overlaps the stages by processing different instructions at the same time. The effective processing speed of each instruction remains unchanged, but the throughput for instruction processing is increased because several instructions are being processed by different individual pipeline stages at the same time.
The beginning stages for the pipeline processing include retrieving instructions from an instruction cache and decoding the instruction in a stage where branch prediction is performed. If a branch is predicted to be taken in the execution of an instruction, all instructions following the branch are invalidated and a new execution sequence begins with the instructions of the predicted branch.
The number of stages in a pipeline increases the latency between the first access of an instruction, and its execution. During sequential execution of instructions, this additional latency is not a problem, as eventually most pipeline stages become occupied. However, there are interruptions in the execution sequence which may be produced as a result of an instruction which branches execution to another set of instructions or interruptions caused by context switching which require switching of the program completely. During the processing of instructions, attempts are made to predict branches which the execution will take. However, prediction errors occur, and when a misprediction is determined, the pipeline may have to be cleared of its contents, and the instructions identified by the branch executed in their place.
The result of a branch misprediction produces a latency between the first access of the correct instruction, and its execution. The latency can be reduced by improving on the branch prediction. However, there is always uncertainty in the prediction, and they are never perfect. When a misprediction occurs, the pipeline encounters a bubble and its contents must be flushed before the new execution sequence may begin.
As one technique for dealing with a mispredicted branch, the system may execute two possible paths of execution, and the correct path is selected once the final branch determination has taken place. This technique is hardware intensive and unrealistic where pipeline depths are approaching state of the art. A related solution saves fetched instructions behind a predicted branch in a buffer for quick access should a branch misprediction be detected. In a machine that uses three pipelines, this has a limited value since any buffered instructions would be located after and on the same cache lines as the branch itself.
Another branch related technique is to shift the determination of branches as far towards the top of the pipeline as possible to reduce the time between branch prediction and branch determination. This reduces the time in which speculative execution is taking place of instructions which may ultimately be discarded. Unfortunately this approach is difficult to implement in the state of the art processing where the clock frequencies are increased and therefore cycle times for each stage are decreased and the number of pipeline stages are increased.
The present invention provides for a latency reduction between stages of a pipeline processor in the face of a mispredicted branch where the execution sequence is changed to a new set of instructions, and the pipeline must be refilled. The same concept can be applied to context switching cases where a latency reduction can be obtained when a new set of instructions are refilled.