Pipeline processor systems are well known in the art. See, e.g., Parallel and Distributed Computing Handbook, A. Y. H. Zomaya, Ed. (McGraw-Hill 1996), which is hereby incorporated by reference. One of the most essential factors affecting processor performance is the delay between the pipeline stage at which a transfer condition becomes known (E stage) and the stage whose operation depends on occurrence of the condition (F stage). In spite of hardware facilities that perform almost perfect dynamic branch direction prediction, mispredictions have a profound impact on CPU performance. Therefore, reduction of this delay, i.e., reduction of the number of clock cycles in the pipeline between transfer condition generation and transfer itself, has an important bearing on CPU performance.
The present invention achieves a decrease in the delay between transfer condition generation and branch execution by commencing execution of two or more branches that follow a transfer instruction before determination of the transfer condition. One of the branches may be moved forward along the system""s main pipeline while other branches begin to execute in parallel along additional pipelines initialized by the system.
In accordance with a first preferred embodiment of the present invention, each transfer instruction is split into two instructions: a control transfer preparation instruction and a control transfer instruction. The control transfer preparation instruction contains the transfer address and is placed by the compiler several instructions ahead of the control transfer instruction. Execution of the control transfer preparation instruction initializes an additional parallel pipeline which duplicates a certain initial part of the main pipeline and executes instructions from the branch. Once the additional pipeline is filled, it is frozen pending determination of the transfer condition. The control transfer instruction is executed when the control transfer condition becomes known. If control is to be transferred to the branch, the number of the additional pipeline whose execution should be continued on the main pipeline is indicated in the control transfer instruction.
In a second preferred embodiment, a first portion of the additional pipeline, which does not use the contents of any register that may be modified by the main pipeline, is filled and frozen. A second portion of the additional pipeline, which may be affected by the contents of a register that may be modified by the main pipeline, is reexecuted every clock cycle.