Contemporary computing systems seek to take advantage of superscalar architectures to improve processing performance. Superscalar architectures are characterized by multiple and concurrently operable execution units integrated through a plurality of registers and control mechanisms. This permits the architecture to execute multiple instructions in an out-of-order sequence, thus utilizing parallelism to increase the throughput of the system.
Although superscalar architectures provide benefits in improving processor performance, there are numerous difficulties involved in developing practical systems. For example, control mechanisms must manage dependencies among the data being concurrently processed by the multiple execution units. Another problem is that of mispredicted branches. When instructions are being executed out-of-order, the processor may predict the outcome of an instruction that could result in a branch in program flow. Otherwise, the processor would have to wait, or stall, until the branching instruction completed. This would reduce the effectiveness of out-of-order execution, since the benefits of parallel execution would be countered by delays in instruction issue each time an instruction is dispatched that could result in a branch. Of course, if a branch is mispredicted, then the processor must have the ability to recover the state immediately prior to the branch so that the error can be corrected.
A variety of techniques have been devised to address these difficulties. One particular technique is referred to as "register renaming." Register renaming involves forming an association between a physical register in the processor and a particular architectural, or logical, register. This relationship is referred to as a "rename pair," and is created each time an instruction writes to an architectured register. Such a renaming scheme is further disclosed in U.S. Pat. No. 6,061,777, which is hereby incorporated by reference herein.
Nevertheless, such superscalar architectures are still limited to the dispatching of a few instructions at a time to the execution units. Since such dispatching of instructions on an instruction-by-instruction basis requires a supporting control structure, there is still room for improvement in the reduction of cycle time needed for executing instructions. Therefore, there is a need in the art for an improved and more efficient method for dispatching instructions to execution units within a superscalar processor.