The present invention relates to executing load instructions within a processor. More particularly, the present invention relates to a method and apparatus for optimizing load memory accesses using load reuse operations.
Most instructions in a computer instruction set operate on several source operands to generate results. The instructions name, either explicitly or through an indirection, the source and destination locations where values are read from or written to. A name may be either a logical, or architectural, register or a location in memory.
Instructions involving register operands are faster than those involving memory operands. For some microprocessor architectures, instructions naming memory operands are translated, or decoded, into microinstructions that transfer operand values from memory to logical registers and then perform the decoded computations. The number of logical registers, however, often is limited, and, as a result, compilers should efficiently utilize logical registers to generate efficient code.
The number of physical registers available in a microprocessor typically exceeds the number of logical registers, so that register renaming may be utilized to increase performance. In particular, for out-of-order processors, register renaming allows instructions to be executed out of their original program order. Thus, for many out-of-order processors, an instruction is renamed so that logical registers named in the original instruction are renamed to physical registers.
Renaming a logical register involves mapping a logical register to a physical register. These mappings are stored in a Register Alias Table (xe2x80x9cRATxe2x80x9d). A RAT maintains the latest mapping for each logical register. A RAT is indexed by logical registers, and provides mappings to corresponding physical registers. This activity may be called dependency tracking.
FIG. 1 depicts a register renaming and dependency tracking scheme involving three structures: RAT 110, active list 102, and free list 104. For each logical register specified by a renamed instruction, an unused physical register from free list 104 is allocated. RAT 110 is updated with this new allocation. Physical registers are free to be used again, or reclaimed, once they cannot be referenced by instructions in the current instruction window.
Based upon the data structures depicted in FIG. 1, one method for register reclaiming is to reclaim a physical register when the instruction that evicted it from RAT 110 retires. Thus, the instruction that created the new allocation to the physical register is retired. As a result, whenever a new allocation updates RAT 110, the evicted old allocation is pushed into active list 102. An active list 102 entry is associated with each instruction in the instruction window. When an instruction retires, the physical register of the old allocation recorded in active list 102, if any, is reclaimed and pushed into free list 104. The cycle is depicted in FIG. 1.
A scheme known as xe2x80x9cresult reusexe2x80x9d may be used to optimize the above-discussed process. Result reuse transforms the internal representation of the data-flow graph to significantly increase the level of instruction-level parallelism. Prior to renaming, whenever the result of an instruction is recognized to match the result of another instruction, the same physical register is used for both instructions. This scheme redirects all dependencies on both instructions towards the instruction that dynamically executes first. Result reuse relies on value-identity detectors. The detector outcome can be either safe or speculative. An example of a safe detector outcome is one directed to move instructions. Using value-identity detection, a move instruction can be completely eliminated from the execution stream. In such a case, it is safe to reallocate the physical register holding the source value because, by definition, the source and destination values are identical. An example of a speculative detector outcome is one directed to memory bypassing. Load instructions often collide with older store instructions in the instruction window of a processor. In such cases, the result of the load instruction is identical to the result that was stored in memory by the colliding store instruction. Predicting such value-identities for load instructions makes it possible to bypass memory accesses completely.
For any incoming instruction, the value-identity prediction structures may predict the location in the instruction window, or anywhere in the physical register space, of another instruction that produces the same result. In this case, the physical register allocated to this older instruction is retrieved from the instruction window, and reallocated for the incoming instruction.
The value identity predictor includes three parts. The first part establishes a potential identity relation between a pair of instructions. The second and third parts record and retrieve this value identity relation into/from the prediction structures. While general methods and structures exist for implementing the second and third parts, the first part typically is still done by an assortment of ad hoc methods for establishing the value identity.
Load instructions access memory locations to load data to the physical registers. The loaded data is, in turn, used in computations with the logical registers. The load instruction execution time is greater than accessing the physical registers. As a result, the number of load instructions may be significant in programs written for processor architectures. Therefore, it may be useful to provide for the efficient execution of load instructions with optimized register renaming and reclaiming schemes.