Modem processors are often pipelined, meaning that execution of each instruction is divided into several stages. FIG. 1 shows a functional block diagram of a conventional pipelined processor 10. This exemplary pipelined processor includes four stages: a fetch (F) stage 12, a decode (D) stage 14, an execute (E) stage 16, and a writeback (W) stage 18. Pipelined processors such as processor 10 may be register-based, i.e., other than for load or store instructions, the source(s) and destination(s) of each instruction are registers. The fetch unit 12 retrieves a given instruction from an instruction memory. The decode stage 14 reads the source register(s) of the instruction, and the writeback stage 18 writes to the destination register(s) of the instruction. In the execute stage 16, the instruction is executed by one of four specialized execution units, for each of which the number of cycles is denoted by the number of boxes: a 1-cycle integer (I) unit 20, an 8-cycle integer/floating point multiplier (M) 22, a 4-cycle floating point adder (Fadd) 24, or a 15-cycle integer/floating point divider (Div) 26. The execution units in this example are fully pipelined, i.e., can accept a new instruction on every clock cycle. These specialized units are used to execute particular types of instructions, and each of the units may have a different latency. An instruction is said to be "dispatched" when it has completed register read in the decode stage 14 and begun execution in the execution stage 16. In other words, a dispatch takes place when an instruction passes from the decode stage 14 to one of the execution units in execution stage 16.
A significant problem with conventional pipelined processors such as processor 10 of FIG. 1 is that the use of a pipeline introduces data hazards which are not present in the absence of a pipeline, because results of previous instructions may not be available to a subsequent instruction. This is often attributable to the different latencies of the various execution units in the processor. Types of data hazards which can arise in conventional pipelined processors include, for example, Read After Write (RAW) data hazards, Write After Write (WAW) data hazards, and Write After Read (WAR) data hazards.
FIG. 2 illustrates an exemplary RAW data hazard, showing how the pipelined processor 10 of FIG. 1 executes sub instructions I.sub.1 and I.sub.2 for processor clock cycles 1 through 5. Instruction I.sub.1 subtracts the contents of its source registers r.sub.2 and r.sub.3 and writes the result to its destination register r.sub.1. Instruction I.sub.2 subtracts the contents of its source registers r.sub.5 and r.sub.1, and writes the result to its destination register r.sub.4. It can be seen that, unless otherwise prevented, the instruction I.sub.2 in the conventional processor 10 will read register r.sub.1 in clock cycle 3, before the new value of r.sub.1 is written by instruction I.sub.1, resulting in a RAW data hazard. In a non-pipelined processor, the instructions as shown in FIG. 2 would not create a hazard, since instruction I.sub.1 would be completed before the start of instruction I.sub.2.
FIG. 3 illustrates an exemplary WAW data hazard, arising when the processor executes instructions I.sub.1 and I.sub.2 for processor clock cycles 1 through 11. Instruction I.sub.1 multiplies the contents of its source registers r.sub.2 and r.sub.3 and writes the result to its destination register r.sub.4. Instruction I.sub.2 subtracts the contents of its source registers r.sub.6 and r.sub.8 and writes the result to destination register r.sub.4. It can be seen that, unless otherwise prevented, instruction I.sub.2 in the conventional pipelined processor will write to register r.sub.4 in clock cycle 5, before instruction I.sub.1, and then I.sub.1 will incorrectly overwrite the result of I.sub.2 in register r.sub.4 in clock cycle 11. This type of hazard could arise if, for example, instruction I.sub.1 were issued speculatively by a compiler for a branch which was statically mispredicted between I.sub.1 and I.sub.2. In the case of in-order instruction completion, instruction I.sub.1 will not affect the outcome, since in-order completion will discard the result of I.sub.1. However, as described above, the hazard is significant in the presence of out-of-order instruction completion.
A WAR hazard occurs, e.g., when register reads are allowed to be performed during later stages and register writes are allowed to be performed in the earlier stages in the pipeline. The exemplary four-stage pipelined processor 10 of FIG. 1 is thus incapable of producing a WAR hazard, but such hazards can arise in other pipelined processors. FIG. 4 illustrates an exemplary WAR data hazard arising in a five-stage pipelined processor including stages A, W.sub.1, B, R.sub.1 and C. In this processor, stages A, B and C are generic pipeline stages, stage W.sub.1 writes an intermediate result to a destination register, and stage R.sub.1 reads the source registers for processing in stage C. The processor executes instructions I.sub.1 and I.sub.2 for processor clock cycles 1 through 6. Instruction I.sub.1 applies an operation op1 to the contents of its source registers r.sub.2 and r.sub.3 and writes the result to its destination register r.sub.1. Instruction I.sub.2 applies an operation op2 to the contents of its source registers r.sub.4 and r.sub.5 and writes the result to destination register r.sub.3. Note that an intermediate result is written to destination register r.sub.3 in the W.sub.1 stage of I.sub.2 before the intended value of r.sub.3 can be read in the R.sub.1 stage of I.sub.1, thereby introducing a WAR hazard.
Predicated instructions also can present a problem for pipelined processors. For example, the processor hardware generally must check the validity of the predicate used for each instruction before it can determine whether or not the instruction should be executed. FIG. 5 shows an example of a predication hazard which can arise in the conventional four-stage pipelined processor 10 of FIG. 1. The processor executes instructions I.sub.1 and I.sub.2 for processor clock cycles 1 through 5. The instruction I.sub.1 is a setpred operation which sets the predicate p1 to a value of 0. It will be assumed that the predicate p1 is true, i.e., has a value of 1, before execution of this instruction. The instruction I.sub.2 is a predicated instruction which, if the predicate p1 is true, performs an add operation using source registers r.sub.2 and r.sub.3 and destination register r.sub.1. Note that I.sub.2 will be executed in this example even though p1 should be false at the point that I.sub.2 dispatches, thereby introducing a predication hazard. W.sub.p and W.sub.d in FIG. 5 represent writeback stages to predication and data registers, respectively. It should be noted that predication hazards, like data hazards, can also be grouped into RAW, WAW or WAR hazards.
When using pipelined processors having multiple execution units with different latencies, it is generally necessary to control the dispatch of instructions so as to ensure proper program execution, i.e., so as to avoid the above-described data and predication hazards. A conventional method, known as pipeline interlock, determines the latency of each instruction and stalls the dispatch of subsequent instructions until the latencies are resolved. However, this method often leads to performance degradation, since consecutive instructions are not guaranteed to have interdependence, and thus need not always be stalled. In addition, this method and other conventional approaches can require unduly complex bypass checking hardware or register renaming hardware.