1. Field of the Invention
The present invention relates generally to computer hardware, and more specifically, to a method and apparatus for dispatching instructions to execution units in waves.
2. Description of the Related Art
Modern processors frequently use techniques that execute instructions in parallel, because parallel execution techniques increase the effective operating speed of the processor. Parallel execution can be implemented in processors that dispatch instructions for execution in an order that does not necessarily reflect the original instruction order, i.e., in out-of-order processors. An out-of-order processor implements a parallel execution scheme by dispatching two or more instructions to the execution units in a wave, ordinarily being dispatched in the same clock cycle.
FIG. 1 illustrates a portion of an out-of-order processor 10. The processor 10 includes a fetcher 12 for retrieving instructions from an instruction cache or memory 14. The fetcher 12 sends the instructions down a pipeline segment 16 that may include hardware devices such as decoders and/or renamers (both not shown). The pipeline segment 16 sends executable instructions to a scheduler 18. The scheduler 18 accepts an instruction from the pipeline segment 16, when a position 22, 24, 26, 28 becomes available in an internal instruction buffer 20. The scheduler 18 dispatches instructions from the instruction buffer 20 to the execution units 30, 32 in an order that may be different from the original instruction order.
The scheduler 18 dispatches an instruction for execution when the resources necessary to execute the instruction become available. The resources for execution may include the data for source operands of the instruction and one of the execution units 30, 32 that is capable of executing the instruction. Each position 22, 24, 26, 28 of the instruction buffer 20 has two spaces 34, 36. The first space 34 stores the machine word for the instruction, i.e., the binary word encoding the instruction and operands. The second space 36 stores the status of the instruction, i.e., "ready" or "not ready." The status has the "ready" value when all of the needed resources for execution are available. The scheduler 18 only dispatches "ready" instructions to the execution units 30, 32.
Referring to FIG. 1, the instruction set of the processor 10 contains instructions of types A, B, and X. The execution units 30, 32 are of types A and B. The "A" execution unit 30 is able to execute the instructions of types A and X. The "B" execution unit 32 is able to execute the instructions of types B and X. Thus, the type X instructions can be executed by either of the execution units 30, 32, but the instructions of types A and B can only be executed by one of the two execution units 30, 32.
FIG. 2 is a flowchart illustrating a pseudo-first-in-first-out ("pseudo-FIFO") method 38 for assigning instructions to be dispatched in waves in the processor 10 of FIG. 1. At block 40, the scheduler 18 determines which entries of the instruction buffer 20 correspond to instructions "ready" for execution. The scheduler 18 determines which instructions are "ready" by reading the status spaces 36 of the positions 22, 24, 26, 28. At blocks 42 and 44, scheduler 18 employs an algorithm for assigning "ready" instructions for execution. At block 42, the scheduler 18 assigns the oldest "ready" instruction to one of the execution units 30, 32 if available and appropriate. At block 44, the scheduler 18 attempts to assign the second oldest "ready" instruction to one of the execution units 30, 32 that is available and appropriate if no execution unit is appropriate for the oldest "ready" instruction. If the remaining execution unit 30, 32 is not appropriate for executing the next oldest ready instruction, the scheduler 18 attempts to assign the next to next oldest ready instruction to the remaining execution units 30, 32, etc. (not shown). At block 46 the scheduler 18 dispatches the assigned instructions to the "A" and "B" execution units 30, 32, in a wave. Wave dispatching ordinarily dispatches the assigned instructions for execution in the same clock cycle. At block 48, the scheduler 18 waits a predetermined time before re-executing the sequence of steps of the method 38.
Referring to FIGS. 1 and 2, though the pseudo-FIFO assignment algorithm may reduce the probability that individual instructions will stagnate in the instruction buffer 20, the algorithm does not make full use of the execution units 30, 32. To see this, one considers a specific form of the method 38 that assigns types A and B instructions to the "A" and "B" execution units 30, 32 respectively. The specific form of the method 38 assigns a type X instruction to the "A" execution unit 30 if available and, if not, a type X instruction is assigned to the "B" execution unit 32 if available. Then, if a type X instruction is the oldest ready instruction, the type X instruction has priority to be assigned to the "A" execution unit 30. If the next oldest ready instruction is of type A, the type A instruction cannot be assigned to the remaining execution unit 32, because the "B" execution unit 32 cannot execute type A instructions. Thus, the "B" execution unit 32 remains idle and an opportunity to dispatch both the type X and A instructions in the same dispatch wave is lost. If the algorithm is changed so that a type X instruction is assigned to the "B" execution unit 32 if available and, if not, the type X instruction is assigned to the "A" execution unit 30 similar inefficiencies arise. The pseudo-FIFO assignment algorithm results in lost opportunities to execute ready instructions and slows the parallel processing of instructions that are dispatched in waves.
The present invention is directed to overcoming, or at least reducing the effects of, one or more of the problems set forth above.