Modern processors are often pipelined, meaning that execution of each instruction is divided into several stages. FIG. 1 shows a functional block diagram of a conventional pipelined processor 10. This exemplary pipelined processor includes four stages: a fetch (F) stage 12, a decode (D) stage 14, an execute (E) stage 16, and a writeback (W) stage 18. Pipelined processors such as processor 10 may be register-based, i.e., other than for load or store instructions, the source(s) and destination(s) of each instruction are registers. The fetch unit 12 retrieves a given instruction from an instruction memory. The decode stage 14 reads the source register(s) of the instruction, and the writeback stage 18 writes to the destination register(s) of the instruction. In the execute stage 16, the instruction is executed by one of four specialized execution units: a 1-cycle integer (I) unit 20, an 8-cycle integer/floating point multiplier (M) 22, a 4-cycle floating point adder (Fadd) 24, or a 15-cycle integer/floating point divider (Div) 26. The execution units in this example are fully pipelined, i.e., can accept a new instruction on every clock cycle. These specialized units are used to execute particular types of instructions, and each of the units may have a different latency. An instruction is said to be "dispatched" when it has completed register read in the decode stage 14 and begun execution in the execution stage 16. In other words, a dispatch takes place when an instruction passes from the decode stage 14 to one of the execution units in execution stage 16.
A significant problem with conventional pipelined processors such as processor 10 of FIG. 1 is that the use of a pipeline introduces data hazards which are not present in the absence of a pipeline, because results of previous instructions may not be available to a subsequent instruction. In addition, even for in-order instruction dispatch, if instructions are allowed to be active in different execution units at the same time, the different latencies of the execution units can result in control hazards and out-of-order instruction completion. Data hazards and control hazards generally must be avoided in order to ensure proper operation of a pipelined processor.
A very common type of data hazard which can arise in a pipelined processor is known as a Read After Write (RAW) data hazard. FIG. 2A illustrates an exemplary RAW data hazard, showing how the pipelined processor 10 of FIG. 1 executes add instructions i.sub.1 and i.sub.2 for processor clock cycles 1 through 5. Instruction i.sub.1 adds the contents of its source registers r.sub.2 and r.sub.3 and writes the result to its destination register r.sub.1. Instruction i.sub.2 adds the contents of its source registers r.sub.5 and r.sub.1 and writes the result to its destination register r.sub.4. It can be seen that, unless otherwise prevented, the instruction i.sub.2 in the conventional processor 10 will read register r.sub.1 in clock cycle 3, before the new value of r.sub.1 is written by instruction i.sub.1. In a non-pipelined processor, the instructions as shown in FIG. 2A would not create a hazard, since instruction i.sub.1 would be completed before the start of instruction i.sub.2.
FIG. 2B illustrates a less common data hazard, referred to as a Write After Write (WAW) data hazard, that can arise in a conventional pipelined processor. In this example, the processor executes instructions i.sub.1 and i.sub.2 for processor clock cycles 1 through 11. Instruction i.sub.1 multiplies the contents of its source registers r.sub.2 and r.sub.3 and writes the result to its destination register r.sub.1. Instruction i.sub.2 adds the contents of its source registers r.sub.4 and r.sub.5 and writes the result to its destination register r.sub.1. It can be seen that, unless otherwise prevented, instruction i.sub.2 in the conventional processor will write to register r.sub.1 in clock cycle 5, before instruction i.sub.1, and then i.sub.1 will incorrectly overwrite the result of i.sub.2 in register r.sub.1 in clock cycle 11. This type of hazard could arise if, for example, instruction i.sub.1 were issued specula and i.sub.2. In the case of in-order instruction completion, instruction i.sub.1 will not affect the outcome, since in-order completion will discard the result of i.sub.1. However, as described above, the hazard is significant in the presence of out-of-order instruction completion.
FIG. 2C shows an example of a control hazard which can arise in a conventional pipelined processor. Control hazards generally result from jumps in the instruction stream. For example, when a branch is taken, an instruction address register, which serves as a program counter, is changed to a new value. As a result, there may be instructions that have been already fetched into the pipeline but should not be executed. In the example of FIG. 2C, a control hazard arises when instructions i.sub.1 through i.sub.4 are executed for clock cycles 1 through 11. Instruction i.sub.2 is a branch instruction brz that will branch to label, i.e., to instruction i.sub.4, if the contents of its source register r.sub.4 have a particular value. In the pipelined processor 10 of FIG. 1, it is assumed that the results of the branch instruction i.sub.2 are not effective until i.sub.2 reaches writeback (W) in clock cycle 5. If the branch is taken, control should jump to instruction i.sub.4 without ever reaching instruction i.sub.3, but by the time this is known, instruction i.sub.3 is already executing.
A number of techniques have been developed in an attempt to address the problems associated with data and control hazards. One such technique, known as "scoreboarding," provides dynamic scheduling of instructions, using a central controller known as a scoreboard, so as to allow out-of-order instruction issue. This approach is often associated with the Control Data 6600 computer, and is described in greater detail in D. A. Patterson and J. L. Hennessy, "Computer Architecture: A Quantitative Approach," Second Edition, Morgan Kaufmann, San Francisco, Calif., pp. 240-251, 1996. A related technique which also utilizes dynamic scheduling to accommodate out-of-order instruction issue is known as Tomasulo's Algorithm, and is described in the above-cited D. A. Patterson and J. L. Hennessy reference at pp. 251-261. Another known technique involves utilizing a reorder buffer, also referred to as a retire buffer. In accordance with this technique, rather than allowing results to be written back to registers immediately after execution, the results are stored in the retire buffer until they can be written back in sequential program order.
Although these and other conventional techniques can resolve pipeline hazard problems, such techniques generally require the addition of substantial complexity to the processor. For example, scoreboarding requires a separate central control unit, Tomasulo's Algorithm requires additional structures such as a broadcast result bus, a register renaming mechanism, and reservation stations, and a retire buffer requires result storage and ordering logic. A need therefore exists for a different and simpler mechanism for avoiding pipeline hazards.