Some data processors use a technique known as pipelining for higher performance. Pipelining is a technique that breaks down the instruction processing task into smaller, modular sub-tasks each of which can be performed during a particular atomic period of time known as a pipeline cycle. By breaking down the task into these smaller cycles data processors can, for example, be fetching one instruction while executing another and while writing back the results of a third instruction into the register file. Thus even though individual instructions may take several cycles to complete, the overall throughput can approach one instruction per pipeline cycle.
Modern microprocessors have more sophisticated pipelines than this three-stage example. For example, a five-stage pipeline may include fetch, decode, operand access, execute, and writeback stages. The longer the pipeline, the more complicated the data processing instruction can be while still maintaining the throughput close to one instruction per cycle.
A problem arises, however, when certain sequences of instructions occur. A particular sequence of instructions may cause what is known as a pipeline dependency. One type of pipeline dependency, known as an operand dependency, occurs when one instruction cannot be executed until the result of execution of a previous instruction is available. For example, assume the instruction sequence:                ADD R2, R0, R1        ADD R3, R1, R2in which the first register is the destination of the result, and the second and third registers store the input operands. Since R2 is the destination register of the first ADD instruction, the execution of the second ADD instruction depends on the outcome of the first ADD instruction and cannot occur until the results of the first ADD instruction are known.        
Another type of dependency is known as a load dependency. For example, assume the instruction sequence:                MOV R0, (R1)        ADD R3, R0, R2In this sequence, the first instruction loads register R0 with the contents of memory pointed to by the address stored in register R1. Obviously correct execution of the ADD instruction depends on the new value of register R0 being available.        
Mike Johnson et al. in U.S. Pat. No. 4,734,852 disclose a method in which a bypass path can be used to forward the results of an earlier memory load operation to a subsequent instruction without having to first write it to the destination register in the register file and then read it from the register file. Thus the new register value is available much earlier and the pipeline stall time after a load dependency can be minimized.
However in some data processors which have deep execution pipelines it would require significant additional circuit area to add bypass paths from every pipeline stage that produces results to the input of the pipeline to forward intermediate results so that new instructions can issue earlier. This is especially true in floating point execution units in which the operand may be, for example, 64 bits long. Furthermore some results are simply not available until the instruction has reached the end of the pipeline.
Accordingly, it would be desirable to take advantage of additional opportunities for reducing the negative impact of dependencies without adding excessive circuit area. These and other desirable features and characteristics of the present invention will become apparent from the subsequent detailed description and the appended claims, taken in conjunction with the accompanying drawings and the foregoing technical field and background.