Data processors having superscalar architectures have the capability of dispatching multiple instructions simultaneously. In such data processors, operations having segregated functionality may be simultaneously dispatched if the candidate instructions are destined for distinct execution units. That is, instructions operating on floating point operands are segregated from fixed-point operations and performed by a floating point unit. The fixed-point operations are executed by a fixed-point unit, and load and store operations are further segregated and performed by a load/store unit. Superscalar processor 100 according to the prior art is illustrated, in block diagram form, in FIG. 1. Instructions are retrieved from memory (not shown) and loaded into I-cache 101. The instructions are retained in I-cache 101 until they are required, or flushed it not needed. Instructions are retrieved from I-cache 101 by fetch unit 102 and loaded into instruction queue 103.
The parallelism of processor 100 includes instruction pipelining whereby instructions are processed in stages. In such an architecture, multiple instructions may be contained in a pipeline, each such instruction being in a different processing stage at a given processor cycle. When a pipeline, such as fixed-point pipeline 105 or floating point pipeline 106, has an available slot, dispatch unit 104 dispatches a next instruction in instruction queue 103 to the appropriate execution pipeline. In processor 100, dispatch unit 104 may dispatch two instructions simultaneously, provided that one of the instructions is bound for fixed-point pipeline 105 and the other is bound for floating point pipeline 106. Alternatively, a load/store instruction may be simultaneously dispatched with an instruction bound for either fixed-point pipeline 105 or floating point pipeline 106.
In addition to instruction execution parallelism, processor 100 implements out-of-order instruction execution to further improve performance. Although instructions are dispatched by dispatch unit 104 in program order, they may be issued from an issue queue such as fixed-point issue queue 107 or floating point issue queue 108, as appropriate, out of program order. An instruction may be issued ahead of a prior instruction, in program order, as soon as all of its operand dependencies have been resolved. That is, as soon as all of its source operands have become available because the instruction generating them has finished its execution.
Out-of-order execution may begin before the source operand has been written back to its destination architected register in general purpose register (GPR) file 109 or floating point register (FPR) file 110. The result from a fixed-point calculation from fixed-point unit 111 and the result of a floating point calculation from floating point unit 112 are written back to the corresponding architected register file, GPR file 109 and FPR file 110, respectively, when the instruction generating the result completes.
Completion is effected by completion unit 113 which re-orders instructions executed out-of-order. When a particular instruction completes, the architected machine state is as if that instruction, and all prior instructions, were executed in program order. In-order completion ensures that processor 100 has the correct architectural state if it must recover from an exception or a branch that has executed speculatively. Because the completion of a particular instruction may occur several cycles after its execution, a rename mechanism is provided to temporarily store operand results prior to their being written back to the architected register at completion.
Rename buffers are used to provide temporary storage for operands generated by instructions that have finished execution but not yet completed. Rename buffer 113 is associated with GPR file 109, for fixed-point instructions in fixed-point pipeline 105. Similarly, rename buffer 114 is associated with FPR file 110 for floating point instructions in floating point pipeline 106.
When an instruction is dispatched by dispatch queue 103 to either fixed-point issue queue 107 or floating point issue queue 108 for fixed-point and floating point instructions, respectively, a renaming mechanism associates a register in the rename buffer with the target architected register. For fixed-point operations, the renaming is provided by GPR renaming logic 115. Similarly, renaming for floating point instructions is generated by FPR renaming logic 116. When an instruction sourcing the renamed architected register is dispatched by dispatch unit 104, GPR renaming logic 115 and FPR renaming logic 116, for fixed-point and floating point instructions, respectively, provide the rename data to the instruction. This is tagged along with the instruction when the instruction enters fixed-point issue queue 107, for fixed-point instructions, or floating point issue queue 108, for floating point instructions. When the instruction issues, it then uses the renaming data to retrieve the source operands from rename buffer 113, for fixed-point instructions, and rename buffer 114 for floating point instructions.
Processor 100, according to the prior art, maintains separate fixed-point registers and floating point registers, GPR file 109 and FPR file 110, respectively. The separate register files each have their own associated rename buffer, rename buffer 113 for GPR file 109, and rename buffer 114 for FPR file 110. In processor 100, in accordance with the prior art, only a small number of instructions may be executed out-of-order. The number of instructions which may be executed out-of-order are limited by the number of registers available in rename buffers 113 and 114. In order that more instructions may be executed out-of-order, there is a need in the art for a mechanism by which temporary storage may be increased without expending chip resources on unnecessarily duplicative storage Moreover, increased parallelism, and the resulting improvements in performance, militate in favor of facilities for dispatching multiple fixed-point or multiple floating point instructions in the same cycle. This may exacerbate the temporary storage problem if multiple execution pipelines, such as fixed-point pipeline 105 and floating point pipeline 106 are duplicated to provide increased parallelism. Thus, there is a need in the art for a rename mechanism that accommodates increased parallelism without unnecessarily duplicating temporary storage.