1. Field of the Invention
The present invention relates to processor architectures, and more specifically to a method and apparatus for forwarding quickly the results of operations to corresponding dependent instructions.
2. Related Art
Results of execution of instructions are often provided to other instructions. For example, assume instruction (1) performs operation A=B+C and instruction (2) performs operations E=A*F, wherein + and * respectively represent an addition and multiplication operation. The result (A) of instruction (1) is to be provided as an operand to instruction (2). Accordingly, instructions (1) and (2) may termed as providing instruction and a dependent instruction respectively.
Instruction dependencies, such as the one noted above, introduce a sequential order in the execution of instructions. That is, instruction (2) can execute only after the result of instruction (1) is available. At least to increase the instruction throughput performance (i.e., number of instructions executed in a specific duration of time), it is generally necessary to provide results of providing instructions to the corresponding dependent instructions quickly.
In one prior environment, the variable A, B, C, E, and F of the above example represent programmer accessible registers, and the result of instruction (1) is provided to instruction (2) only after the result of addition is stored in register A. As a consequence, delay exists between the beginning of execution of instruction (2) and the availability of the result of instruction (1). The delays may reduce the overall instruction throughput performance, and may thus be undesirable.
Another prior environment partially overcomes the throughput problem by providing the result to instruction (2) before storing in register A. In such an environment, a multiplexor is used associated with each of the registers, with each multiplexor selecting one result from the results generated by many operation units (e.g., adder and multiplier in the above example). Thus, the output of each multiplexor is stored in the corresponding register.
Before completing storing of the results in architecture registers, a result at the output of the multiplexor may be provided to an operation unit executing the corresponding dependent instruction, and is often referred to as data forwarding. Another multiplexor may be used to select from among the outputs of the multiplexors storing to the respective registers, and the selected output is forwarded as an operand for the dependent instruction. Due to such forwarding, a dependent instruction can be executed without having to wait for a prior result to be stored.
One problem with the such an approach is that the presence of the two multiplexors (one used for selection of operand to store in architecture registers and another used for data forwarding) may lead to unacceptably long propagation delays. The delays may in turn impede the instruction throughput performance. Accordingly, what is needed is a method and apparatus which enables the results of operations to be quickly forwarded to the corresponding dependent instructions.