Embodiments of the present invention relate to computer technology, and more particularly, to processor architecture.
Most instructions in a computer instruction set operate on several source operands and generate results. They name, either explicitly or through an indirection, the source and destination locations where values are read from or written to. A name may be either a logical (architectural) register or a location in memory.
Instructions involving only register operands are faster than those involving memory operands. For some microprocessor architectures, instructions naming memory operands are translated (decoded) into micro-instructions that first transfer operand values from memory to logical registers and then perform the indicated operations. However, the number of logical registers is often limited, and as a result, it is important for compilers to efficiently utilize logical registers in order to generate efficient code.
Usually, the number of physical registers available in a microprocessor exceeds the number of logical registers, so that register renaming may be utilized to increase performance. In particular, for out-of-order processors, register renaming facilitates the execution of instructions out of their original program order.
Renaming a logical register involves mapping a logical register to a physical register. These mappings are stored in a RAT (Register Alias Table). A RAT maintains the latest mapping for each logical register. A RAT is indexed by logical registers, and provides mappings to corresponding physical registers (dependency-tracking).
Illustrated in FIG. 1 is a register renaming and dependency tracking scheme involving three structures: RAT 110, active list (AL) 102, and free list (FL) 104. For each logical register specified by a renamed instruction (or renamed micro-instruction), an unused physical register from FL 104 is allocated and RAT 110 is updated with this new mapping. Physical registers are free to be used again (i.e., reclaimed) once they cannot be referenced anymore by instructions in the current instruction window. Based upon the data structures depicted in FIG. 1, one method for register reclaiming is to reclaim a physical register only when the instruction that evicted it from RAT 110, i.e., the instruction that created a new mapping to the physical register, retires. As a result, whenever a new mapping updates RAT 110, the evicted old mapping is pushed into AL 102. (An AL entry is associated with each instruction in the instruction window.) When an instruction retires, the physical register of the old mapping recorded in AL 102, if any, is reclaimed and pushed into FL 104. This cycle is depicted in FIG. 1.
For many instructions belonging to the Intel(copyright) Architecture 32-bit (IA-32) instruction set (Intel(copyright) is a registered trademark of Intel Corporation, Santa Clara, Calif.), one of the source registers is also used as the destination register. If the value stored in this source register is needed by subsequent (in program order) instructions, a register-move instruction may be inserted prior to the subsequent instruction to copy the source operand in the source register to another logical location so that it can be accessed by the subsequent instruction. (IA-32 move instructions operating on memory operands are considered load or store instructions.)
Another reason for the insertion of register-move instructions in IA-32 code is to set the parameter values in the appropriate registers prior to a procedure call. The IA-32 Application Binary Interface (ABI) requires parameters for a procedure call to be passed on the stack. However, compilers often use alternate, non-standard, register-based parameter passing, when possible. For RISC instruction set architecture machines, register-move instructions are mainly used for parameter passing.
Usually, whenever a logical register is needed for a computation but all available logical registers are in use, a store instruction is inserted in the compiled code so that the content of one of the used logical registers is stored (spilled) into a memory location in order to free up a logical register. A later (in program order) load instruction is then inserted to load from memory the stored value if subsequent instructions need it. As a result, compiled machine code often contains load instructions that access the same memory location as an earlier (in program order) store instruction. In such cases, a load instruction is said to collide with an earlier store instruction.
In addition to register renaming, many microprocessors also perform memory renaming utilizing a re-order type buffer called a forwarding buffer. A forwarding buffer stores both memory locations and values as indicated by store instructions. For convenience, we refer to a memory location named in a store instruction as a store instruction address and the value to be stored as a store instruction result. An entry in the forwarding buffer is allocated for every store instruction. The memory hierarchy is updated with a store instruction result only after the store instruction retires. Upon a store instruction retirement, a store buffer may be utilized to store results before updating the memory hierarchy. A store may be visualized as a move from a register (or an immediate value) to the forwarding buffer.
Many prior art microprocessors process load instructions as if dependent upon all earlier (in program order) store instructions. In this way, a load instruction does not start execution until all earlier store instructions have finished execution. A load instruction address (i.e., the memory location of the value to be loaded) is checked with addresses in the forwarding buffer and the memory cache (and perhaps store buffer). If there is a hit in the forwarding buffer, then the result is loaded from the entry in the forwarding buffer corresponding to the youngest store instruction (latest in program order) in the forwarding buffer having a store instruction address matching the load instruction address.
In light of the previous discussion, the number of register-move instructions and the number of colliding load-store instruction pairs may be quite significant in typical IA-32 programs, as well as for programs written for other processor architectures. It is believed that current processor architectures do not process these and other instructions in the most efficient manner. The present invention provides a practical processor architecture so that register-move instructions, load-store instruction pairs, and other types of instructions may be processed more efficiently.
Embodiments of the present invention are directed processors in which more than one logical register may be mapped to the same physical register. For one embodiment, a processor has a set of counters associated with physical registers to indicate which physical registers are free. In another embodiment, the counter is changed by an increment when a new mapping to its physical register is created in a register allocation table. For another embodiment, an identifier creates a mapping between a logical register named in an instruction and a physical register if the logical register is predicted to have the same value as the physical register when the processor is in a state in which the instruction has executed and is retired.