The present invention relates to a method and apparatus for executing instructions in a computer. More specifically, the present invention relates to a method and apparatus for decoupling instruction steering operations from instruction dispatch operations in a pipelined microarchitecture.
Pipelining is an implementation technique whereby multiple instructions are overlapped in execution. A pipeline (also known as a functional unit) completes each instruction in a series of steps called pipeline stages. Instructions "enter" at one end of the pipeline, are processed through the stages, and "exit" at the other end (i.e., their intended effects are carried out).
The throughput of the pipeline is determined by how often instructions are completed. The time required to move an instruction one step down the pipeline is known as a machine cycle. The length of a machine cycle is determined by the time required by the slowest pipeline stage because all the stages must proceed at the same time. In this type of architecture, as in most, the chief means of increasing throughput is reducing the duration of the clock cycle. However, an alternative to increasing the clock frequency is to employ more than one pipeline. In systems employing multiple pipelines, instructions are dispatched by a scheduler, instruction steering and dispatch logic, or similar hardware construct. Instructions may be dispatched to the pipelines based on numerous factors, such as pipeline availability, op-code type, operand availability, data dependencies, and other considerations.
FIG. 1 is a flow chart which illustrates an exemplary set of pipeline stages according to the prior art. Not all instructions perform operations during each of these stages, but it is expected that each instruction will go through each of these stages, for reasons of coherency. The execution of an instruction begins at step 5 with the generation of an address (also known as the A stage). Next, at step 10, this address is presented to a memory unit such as an instruction cache during the instruction fetch (or F) stage. Once fetched, the instruction might then proceed to an instruction issue queue. At step 20, the instruction issues from the instruction issue queue during the instruction issue (or I) stage. In a microarchitecture using multiple pipelines, the proper functional unit may also be selected by an instruction steering unit at this time.
At step 30, operands are read from the register files or other locations during the read operand stage (or R stage). The instruction may be dispatched in the functional unit selected by instruction steering logic, although the instruction may also be held for a number of machine cycles. At this point, the register and functional unit dependencies are determined for the instructions in the present group of instructions with respect to those instructions and instructions in the functional unit(s). This determination is made concurrently with the register file access. If a dependency is found, or other reason exists to stall the instruction, the instruction in question is held until the dependency or other restriction is resolved. Thus, among its other functions, the R stage acts to hold one or more instructions until such time as the instruction(s) can safely be executed.
Once the necessary operands have been read, the instruction is executed during the execution (or E) stage (step 40). Actions taken during the E stage include the computation of results, including most arithmetic, shift, and the logical operations. Virtual addresses are also often computed during this stage, allowing data accesses to begin. Next is the cache read stage (C stage), during which certain integer results are written to temporary registers and certain load instructions have their results delivered (step 50), among other operations.
At step 60, cache misses are detected during the cache miss stage (M stage). This may be done, for example, by comparing the results from a translation lookaside buffer to the physical address contained in cache tag random-access memory (RAM), in addition to other operations. During the write data stage (W stage), data may be written to temporary storage areas (step 70). Exceptional conditions requiring the cancellation and re-issue (e.g., recirculation) of one or more instructions may also be signaled during this stage. An instruction is recirculated by re-introducing the instruction (and, possibly, other instructions) into the instruction queue. Conditions leading to recirculations include data cache misses, TLB misses, incorrect predictions, and other such conditions. Integer traps, as well as other exceptions, are signaled at step 80 during the trap determination stage (T stage). At the data write stage (D stage), data is written to a register file (step 90).
In a pipeline of the prior art, such as that described, the instruction steering logic must examine the state of the functional units during every machine cycle, including examining instructions held in the R stage and instructions released in the E stage for execution. The instruction steering and dispatch logic must decide which of the waiting instructions can safely proceed into their assigned functional unit at the end of the machine cycle (i.e., which instructions may vacate the R stage). This is a complex decision, and the logic dedicated to making this decision often contains some of a microarchitecture's most critical paths. At the same time, the instruction steering and dispatch logic must examine the instructions about to issue, allocate functional units for their execution, and then transfer as many of those instructions as possible from the instruction queue to their target functional units.
Because instructions are removed from the instruction queue in order (for an in-order pipelined architecture), an instruction cannot be removed from the instruction queue unless all of the preceding instructions will also be removed by that time. An instruction cannot be transferred from the instruction queue to the R stage of its assigned functional unit unless that functional unit is either unoccupied in the present machine cycle or contains an instruction that will vacate the R stage at this time.
Thus, a tension exists. The instruction queue needs to know as early in the machine cycle as possible whether or not an instruction in a particular R stage will vacate because of the time consumed in grouping instructions and determining where the instruction should be sent. Unfortunately, the decision to vacate a functional unit's R stage by the end of the present machine cycle may not be made until late in that cycle for several reasons. Certain data dependencies may only be resolved late in the machine cycle. The instruction queue and instruction steering logic are thus not able to depend on receiving vacate information early enough to permit these units to safely issue and steer the instructions waiting to be issued.
If a pipelined microarchitecture is to be capable of dispatching one instruction per cycle into each functional unit, this limitation must be addressed. One way to address this limitation is to increase the duration of the machine cycle (i.e., slow the pipeline) sufficiently to allow the determination to be made. However, this has the untenable side-effect of reducing throughput. What is therefore required is a method and apparatus which permits the determination of whether a particular instruction can safely be issued from an instruction queue to the next stage of the pipeline at a point early in the machine cycle, while maintaining the pipeline's throughput at an acceptable level.