Contemporary computing systems seek to take advantage of superscalar architectures to improve processing performance. Superscalar architectures are characterized by multiple and concurrently operable execution units integrated through a plurality of registers and control mechanisms. This allows the architecture to execute multiple instructions in an out-of-order sequence, thus utilizing parallelism to increase the throughput of the system.
Although superscalar architectures provide benefits in improving processor performance, there are numerous difficulties involved in developing practical systems. For example, the control mechanisms must manage dependencies among the data being concurrently processed by the multiple execution units. Another problem is that of mispredicted branches. When instructions are being executed out-of-order, the processor may predict the outcome of an instruction that could result in a branch in program flow. Otherwise, the processor would have to wait, or stall, until the branching instruction completed. This would reduce the effectiveness of out-of-order execution, since the benefits of parallel execution would be countered by delays in instruction issue each time an instruction is dispatched that could result in a branch. Of course, if a branch is mispredicted, then the processor must have the ability to recover to the state immediately prior to the branch so that the error can be corrected.
A variety of techniques have been devised to address these difficulties. Some of these techniques are discussed in Johnson, et al., Superscalar Microprocessor Design, Prentice Hall (1991). One particular technique is referred to as "register renaming."Register renaming involves forming an association between a physical register in the processor and a particular architectural, or logical, register. This relationship is referred to as a "rename pair," and is created each time an instruction writes to an architected register. The associations are maintained in a map within the processor which allows the processor to recover if a branch is mispredicted. This is explained in more detail with respect to FIG. 1.
FIG. 1 depicts a rename register map ("RMAP") which is used to track rename pairs. The RMAP includes a plurality of entries, one entry for each of n physical registers. In this case, there is shown a RMAP for a processor with 10 physical registers. Each entry in the RMAP includes an architected status bit, or "A-bit" field, or field, and an architected register field. The architected register field contains a pointer to the architected register that forms an architectural-physical pair with the physical register corresponding to the index of the entry. The A-bit indicates whether this architectural-physical pair is the most recent or not.
In conventional systems, only the A-bit is provided for in the RMAP. On dispatch, when an instruction needs a particular architected General Purpose Register ("GPR") as a source operand, the GPR pointer is compared to every architected pointer in the RMAP. Typically, the RMAP is implemented as a content addressed memory ("CAM") and the GPR pointer is applied to a read port of the CAM. If the A-bit of the matching entry is on, the corresponding output of the read port supplies the physical register pointer either in a decoded bit vector form or in a binary coded number. The output is then used to address the physical register for operand access. When an instruction that needs a target architected register is dispatched, the A-bit of the CAM entry that matched the architected target register is reset. At the same time, a new entry is assigned to form a new rename pair for the target architected register. The rename pair is formed by writing the architected GPR pointer into the next available RMAP entry corresponding to a free physical register, and setting the corresponding A-bit. The physical register pointer of the old rename pair is stored with the dispatched instruction in an instruction completion buffer. When the instruction completes, the physical register stored with the instruction is released back to the available physical register pool again. When an interruptible instruction, such as a branch instruction, is dispatched, the A column of the RMAP is stored to a table entry associated with the instruction. If the interruptible instruction causes an exception which requires a flush of the speculatively executed instructions, then the A column corresponding to the excepting instruction is retrieved and written back to the RMAP. Essentially, this restores the rename map to its state immediately prior to the execution of the interruptible instruction. For purposes of the present discussion, the term interruptible instruction refers to: (1) branch instructions, (2) speculatively executed load instructions which need to be re-executed because their execution loaded stale data, (3) instructions that can cause exceptions, and (4) instructions associated with an interrupt occurrence.
Also, as used herein, the term "interruptible point" refers to one of the following events associated with the types of interruptible instructions respectively: (1) an unresolved branch instruction which starts speculative execution along the wrong path, (2) a speculatively executed load instruction which executes ahead of the store instruction that produces its load data, (3) the execution of an instruction which causes an exception, and (4) an interrupt which occurs during execution of an instruction. An interrupt is said to occur on the occurrence of one of the above interrupt points.
However, physical registers are a limited resource on a processor. In order to sustain a higher number of active instructions, more physical registers are required. Since only a limited number of physical registers can be provided on a processor, there is a limit to the number of instructions which can be processed in parallel. Since conventional processors do not provide for reuse of physical registers until the instruction that created the rename pair completes, the practical limit is even lower. It is therefore desirable that the registers required by the operation of the processor be minimized or compressed in order to ensure that only the minimum number of physical registers are in use at a given time.