Computers have generally been designed in accordance with the von Neumann architecture, which is an approach to computer design characteristics of most commonly used computers, including microcomputers, attributed to the work of Hungarian-born mathematician John von Neumann. The von Neumann architecture is synonymous with the concept of a stored program--one that can be permanently stored in a computer and, because of the way it is coded, can be manipulated or made self-modifying through machine-based instructions. The familiar concept of sequential processing, a one-instruction-at-a-time approach to operations, is characteristic of von Neumann architecture.
The problem with such a sequential-instruction computer architecture is that its speed of operation is limited by how fast the logic circuitry can execute. One solution to this problem has been the design and use of superscalar microprocessor architectures, which enable the microprocessor to execute multiple instructions per clock cycle. Such a superscalar processor is the PowerPC processor made by IBM.
In such a superscalar processor, the instruction unit dispatches several instructions at one time to the various execution units within the processor. However, another problem arises with such an architecture, since some instructions are dependent upon the completion of other instructions within a program. In other words, an operation to add two operands may have to wait until another instruction is completed and produces one of the operands to be added.
Such processors also make use of what are often referred to as branch processing units, which retrieve instructions having a possible branch condition whereby two different paths can be taken within the program instruction flow depending on the results of a previous instruction. Such branch processing units predict which program branch will likely be taken by the processor, and then proceed to have the succeeding instructions within that branch begin to execute. A completion unit within the processor provides a mechanism to track instructions from dispatch through execution, and then retire or "complete" them in program order. Completing an instruction implies the commitment of the results of instruction execution to the architected registers. In-order completion insures the correct architectural state should the processor have to recover from a mispredicted branch, or any other inception or interrupt. Results of "completed" instructions are written to the architected registers.
To avoid contention for a given register file location in the course of out-of-order execution, the processor may provide rename registers for the storage of instruction results prior to their commitment to the architected registers by the completion unit. Several rename registers, or buffers, may be provided for each of the various execution units and their associated architected registers within the processor.
When the dispatch unit dispatches an instruction to its execution unit, it allocates a rename register for the results of that instruction. If an instruction is dispatched to a reservation station associated with an execution unit due to a data dependency, the dispatcher will also provide a tag to the execution unit identifying which rename register will forward the required data upon instruction completion. When the data is available in the rename register, the pending execution may begin.
Instruction results are transferred from the rename registers to the architected registers by the completion unit when an instruction is retired from the completion queue without exceptions and after any speculative branch conditions proceeding it in the completion queue have been resolved correctly. If a speculatively executed branch is found to have been incorrectly predicted, the speculatively executed instructions following the branch will be flushed from the completion queue, and the results of those instructions will be flushed from the rename registers.
A bottleneck in the dispatching of instructions can occur when all rename registers have been allocated by the dispatch unit. This will cause the dispatch unit to stall until a rename register becomes free for an assignment.
In prior art implementations, renaming schemes have been implemented which utilize complex control and data flow or use content addressable memories to alleviate the above problem. Additionally, as more execution units are implemented, more rename registers are needed in order to support the growing number of potential speculative instructions.
Therefore, there is a need in the art for a processor architecture that alleviates the above inefficiency in the dispatching of instructions.