Contemporary computing systems seek to take advantage of superscalar architectures to improve processing performance. Superscalar architectures are is characterized by multiple and concurrently operable execution units integrated through a plurality of registers and control mechanisms. This allows the architecture to execute multiple instructions in an out-of-order sequence, thus utilizing parallelism to increase the throughput of the system.
Although superscalar architectures provide benefits in improving processor performance, there are numerous difficulties involved in developing practical systems. For example, the control mechanism must manage dependencies among the data being concurrently processed by the multiple execution units. Another problem is that of mispredicted branches. When instructions are being executed out-of-order, the processor may predict the outcome of an instruction that could result in a branch in program flow. Otherwise, the processor would have to wait, or stall, until the branching instruction completed. This would reduce the effectiveness of out-of-order execution, since the benefits of parallel execution would be countered by delays in instruction issue each time an instruction is dispatched that could result in a branch. Of course, if a branch is mispredicted, then the processor must have the ability to recover to the state immediately prior to the branch so that the error can be corrected.
A variety of techniques have been devised to address these difficulties. Some of these techniques are discussed in Johnson, et al., Superscalar Microprocessor Design, Prentice Hall (1991). One particular technique is referred to as "register renaming." Register renaming involves forming an association between a physical register in the processor and a particular architectural, or logical, register. This relationship is referred to as a "rename pair," and is created each time an instruction writes to an architected register. The associations are maintained in a map within the processor which allows the processor to recover if a branch is mispredicted. This is explained in more detail with respect to FIG. 1.
FIG. 1 depicts a register mapping table ("RMAP") which is used to track rename pairs. The RMAP includes a plurality of entries, one entry for each of n physical registers. In this case, there is shown an RMAP for a processor with 10 physical registers. Each entry in the RMAP includes an architected status bit, or "A-bit" field and an architected register field. The architected register field contains a pointer to the if architected register that forms an architectural-physical pair with the physical register corresponding to the index of the entry. The A-bit indicates whether this architectural-physical pair is the most recent or not.
In conventional systems, only the A-bit is provided for in the RMAP. On dispatch, when an instruction needs a particular architected GPR as a source operand, the GPR pointer is compared to every architected pointer in the RMAP. Typically, the RMAP is implemented as a content addressed memory ("CAM") and the GPR pointer is applied to a read port of the CAM. If the A-bit of the matching entry is on, the corresponding output of the read port supplies the physical GPR pointer either in a decoded bit vector form or in a binary coded number. The output is then used to address the physical register for operand access. When an instruction that needs a target architected register is dispatched, the A-bit of the CAM entry that matched the architected target register is reset. At the same time, a new entry is assigned to form a new rename pair for the target architected register. The rename pair is formed by writing the architected GPR pointer into the next available RMAP entry corresponding to a free physical register, and setting the corresponding A-bit. The physical register pointer of the old rename pair is stored with the dispatched instruction in an instruction completion buffer. When the instruction completes, the physical register stored with the instruction is released back to the available physical register pool again. When an interruptible instruction, such as a branch instruction, is dispatched, the A column of the RMAP is stored to a table entry associated with the instruction. If the interruptible instruction causes an exception which requires a flush of the speculatively executed instructions, then the A column corresponding to the excepting instruction is retrieved and written back to the RMAP. Essentially, this restores the rename map to its state immediately prior to the execution of the interruptible instruction.
However, physical registers are a limited resource on a processor. In order to sustain a higher number of active instructions, more physical registers are required. Since only a limited number of physical registers can be provided on a processor, there is a limit to the number of instructions which can be processed in parallel. Since conventional processors do not provide for reuse of physical registers until the instruction that created the rename pair completes, the practical limit is even lower. Accordingly, it is an object of the present invention to provide a processor which overcomes this disadvantage. It is a further object of the present invention to provide techniques for improving the out-of-order processing capabilities of superscalar processors. Still further objects and advantages of the present invention will become apparent in view of the following disclosure.