Most instructions in a computer instruction set operate on several source operands to generate results. The instructions name, either explicitly or through an indirection, the source and destination locations where values are read from or written to. A name may be either a logical, or architectural, register or a location in memory.
Instructions involving register operands are faster than those involving memory operands. For some microprocessor architectures, instructions naming memory operands are translated, or decoded, into micro-instructions that transfer operand values from memory to logical registers and then perform the decoded computations. The number of logical registers, however, often is limited, and, as a result, compilers should efficiently utilize logical registers to generate efficient code.
The number of physical registers available in a microprocessor typically exceeds the number of logical registers, so that register renaming may be utilized to increase performance. In particular, for out-of-order processors, register renaming allows instructions to be executed out of their original program order. Thus, for many out-of-order processors, an instruction is renamed so that logical registers named in the original instruction are renamed to physical registers.
Renaming a logical register involves mapping a logical register to a physical register. These mappings are stored in a Register Alias Table (“RAT”). A RAT maintains the latest mapping for each logical register. A RAT is indexed by logical registers, and provides mappings to corresponding physical registers. This activity may be called dependency tracking.
FIG. 1 depicts a register renaming and dependency tracking scheme involving three structures: RAT 110, active list 102, and free list 104. For each logical register specified by a renamed instruction, an unused physical register from free list 104 is allocated. RAT 110 is updated with this new allocation. Physical registers are free to be used again, or reclaimed, once they cannot be referenced by instructions in the current instruction window.
Based upon the data structures depicted in FIG. 1, one method for register reclaiming is to reclaim a physical register when the instruction that evicted it from RAT 110 retires. Thus, the instruction that created the new allocation to the physical register is retired. As a result, whenever a new allocation updates RAT 110, the evicted old allocation is pushed into active list 102. An active list 102 entry is associated with each instruction in the instruction window. When an instruction retires, the physical register of the old allocation recorded in active list 102, if any, is reclaimed and pushed into free list 104. The cycle is depicted in FIG. 1.
A scheme known as “result reuse” may be used to optimize the above-discussed process. Result reuse transforms the internal representation of the data-flow graph to significantly increase the level of instruction-level parallelism. Prior to renaming, whenever the result of an instruction is recognized to match the result of another instruction, the same physical register is used for both instructions. This scheme redirects all dependencies on both instructions towards the instruction that dynamically executes first. Result reuse relies on value-identity detectors. The detector outcome can be either safe or speculative. An example of a safe detector outcome is one directed to move instructions. Using value-identity detection, a move instruction can be completely eliminated from the execution stream. In such a case, it is safe to reallocate the physical register holding the source value because, by definition, the source and destination values are identical. An example of a speculative detector outcome is one directed to memory bypassing. Load instructions often collide with older store instructions in the instruction window of a processor. In such cases, the result of the load instruction is identical to the result that was stored in memory by the colliding store instruction. Predicting such value-identities for load instructions makes it possible to bypass memory accesses completely.
For any incoming instruction, the value-identity prediction structures may predict the location in the instruction window, or anywhere in the physical register space, of another instruction that produces the same result. In this case, the physical register allocated to this older instruction is retrieved from the instruction window, and reallocated for the incoming instruction.
The value identity predictor includes three parts. The first part establishes a potential identity relation between a pair of instructions. The second and third parts record and retrieve this value identity relation into/from the prediction structures. While general methods and structures exist for implementing the second and third parts, the first part typically is done by an assortment of ad hoc methods for establishing the value identity.
For many instructions belonging to the Intel® Architecture 32-bit (IA-32) instruction set (Intel® is a registered trademark of Intel Corporation, Santa Clara, Calif.), one of the source registers is also used as the destination register. If the value stored in this source register is needed by subsequent (in program order) instructions, a register-move instruction may be inserted prior to the subsequent instruction to copy the source operand in the source register to another logical location so that it can be accessed by the subsequent instruction. (IA-32 moves instructions operating on memory operands are considered load or store instructions.)
Another reason for the insertion of register-move instructions in IA-32 code is to set the parameter values in the appropriate registers prior to a procedure call. The IA-32 Application Binary Interface (ABI) requires parameters for a procedure call to be passed on the stack. However, compilers often use alternate, non-standard, register-based parameter passing, when possible. For RISC instruction set architecture machines, register-move instructions are mainly used for parameter passing.
As a result, the number of register-move instructions may be quite significant in typical IA-32 programs, as well as for programs written for other processor architectures. Therefore, there is a need for the efficient execution of register-move instructions with efficient register renaming and reclaiming schemes.