Embodiments of the present invention relate to computer technology, and more particularly, to processor architecture.
Most instructions in a computer instruction set operate on several source operands and generate results. They name, either explicitly or through an indirection, the source and destination locations where values are read from or written to. A name may be either a logical (architectural) register or a location in memory.
Instructions involving only register operands are faster than those involving memory operands. For some microprocessor architectures, instructions naming memory operands are translated (decoded) into micro-instructions that first transfer operand values from memory to logical registers and then perform the indicated computations. However, the number of logical registers is often limited, and as a result, it is important for compilers to efficiently utilize logical registers in order to generate efficient code.
Usually, whenever a logical register is needed for a computation but all available logical registers are in use, a store instruction is inserted in the compiled code so that the content of one of the used logical registers is stored (spilled) into a memory location in order to free up a logical register. A later (in program order) load instruction is then inserted to load from memory the stored value if subsequent instructions need it. As a result, compiled machine code often contains load instructions that access the same memory location as an earlier (in program order) store instruction. In such cases, a load instruction is said to collide with an earlier store instruction.
Usually, the number of physical registers available in a microprocessor exceeds the number of logical registers, so that register renaming may be utilized to increase performance. In particular, for out-of-order processors, register renaming allows instructions to be executed out of their original program order. Thus, for many out-of-order processors, an instruction is renamed so that logical registers named in the original instruction are renamed to physical registers.
Renaming a logical register involves mapping a logical register to a physical register. These mappings are stored in a RAT (Register Alias Table). A RAT maintains the latest mapping for each logical register. A RAT is indexed by logical registers, and provides mappings to corresponding physical registers (dependency-tracking).
Illustrated in FIG. 1 is a register renaming and dependency tracking scheme involving three structures: RAT 110, active list (AL) 102, and free list (FL) 104. For each logical register specified by a renamed instruction (or renamed micro-instruction), an unused physical register from FL 104 is allocated and RAT 110 is updated with this new mapping. Physical registers are free to be used again (i.e., reclaimed) once they cannot be referenced anymore by instructions in the current instruction window.
Based upon the data structures depicted in FIG. 1, one method for register reclaiming is to reclaim a physical register only when the instruction that evicted it from RAT 110, i.e., the instruction that created a new mapping to the physical register, retires. As a result, whenever a new mapping updates RAT 110, the evicted old mapping is pushed into AL 102. (An AL entry is associated with each instruction in the instruction window.) When an instruction retires, the physical register of the old mapping recorded in AL 102, if any, is reclaimed and pushed into FL 104. This cycle is depicted in FIG. 1.
In addition to register renaming, many microprocessors also perform memory renaming utilizing a re-order type buffer called a forwarding buffer. A forwarding buffer stores both memory locations and values as indicated by store instructions. For convenience, we refer to a memory location named in a store instruction as a store instruction address and the value to be stored as a store instruction result. An entry in the forwarding buffer is allocated for every store instruction. The memory hierarchy is updated with a store instruction result only after the store instruction retires. Upon a store instruction retirement, a store buffer may be utilized to store results before updating the memory hierarchy. A store may be visualized as a move from a register (or an immediate value) to the forwarding buffer.
Many prior art microprocessors process load instructions as if dependent upon all earlier (in program order) store instructions. In this way, a load instruction does not start execution until all earlier store instructions have finished execution. A load instruction address (i.e., the memory location of the value to be loaded) is checked with addresses in the forwarding buffer and the memory cache (and perhaps store buffer). If there is a hit in the forwarding buffer, then the result is loaded from the entry in the forwarding buffer corresponding to the youngest store instruction (latest in program order) in the forwarding buffer having a store instruction address matching the load instruction address.
Because of the number of load instructions that collide with earlier store instructions, greater processing throughput may be realized for microprocessors having architectures that process colliding store and load instructions more efficiently than in the prior art.
Embodiments of the present invention are directed to a unified renaming scheme in which more than one logical register may be mapped to the same physical register. One embodiment comprises a physical register file and a register allocation table for storing mappings between logical and physical mappings. If a load instruction is predicted to collide with an earlier in-flight store instruction, then the register allocation table maps the source logical register named in the in-flight store instruction and the destination logical register named in the load instruction to the same physical register.