1. Field of the Invention
The present invention generally relates to result forwarding, and more particularly to result forwarding in high-speed processors.
2. Description of the Related Art
In a conventional processor, the result of an execution unit (e.g., an arithmetic logic unit or ALU) or a load instruction is often written into a common register file. If a next instruction needs the result as an operand, the result is read from the common register file.
In a conventional, high performance processor, it is often beneficial to immediately use the result of an execution unit or a load instruction as a source operand for the next instruction if need be without waiting for the result to be first written into a common register file. This is called result forwarding or result bypassing. Result forwarding can substantially increase the performance of a processor.
However, to implement result forwarding, the output of each execution unit in a processor must be connected to the execution unit's own inputs and to the inputs of every other execution unit of the processor. Moreover, some instructions require up to three source operands, namely, RA, RB, and RC. As a result, each execution unit of the processor must have three operand inputs for receiving up to three operands RA, RB, and RC. The particular operand input to the execution unit is selected by a selector, which receives the output of each execution unit. Each execution unit has three selectors, one for each operand. According, if the processor has six execution units, there must be (3 operand sources)×(6 units)=18 different forward data paths coming to each execution unit (via the respective selectors). Each forward data path requires a set of physical wires. If a second forward cycle is needed, there must be double the number of forward data paths required above, that is, (18 forward data paths)×2=36 different forward data paths coming to each execution unit. Further, the load and store pipes may be deeper than the ALU pipe, requiring more forward data paths to each execution unit. In other words, while only two ALU instructions (i.e., instructions that require the use of an ALU) may be executed simultaneously, three or more load/store instructions may be executed simultaneously and each of the load/store instructions requires a separate forward data path to the inputs of the execution units. This requires more forward data paths to each execution unit. It is very difficult to build 40 or more 64-bit forward data paths. Moreover, huge buses are required, and not all of them can be made local to be fast. Also, forwarding control logic must be controlled by control signals generated from register address comparators. These register address comparators must compare all result register numbers from two or three previous cycles to all possible source register numbers for the current cycle. The comparisons increase design complexity and greatly slow down the forwarding control logic.
Accordingly, there is a need for a method and apparatus in which result forwarding is implemented with less forward data paths coming to execution units of a processor.