The present invention relates generally to the field of processor technology. More specifically, the present invention relates to a method and apparatus for fast, speculative floating point register renaming.
Processors typically execute instructions by reading the source operands of the instructions from various registers and storing the destination operands or results of the executed instructions into various registers. Registers are used to provide temporary storage area within a processor for storing arithmetic and other data used by various units within the processor to perform their corresponding functions. Different registers may be used for different purposes or functions. For instance, some registers may be used for storing results from arithmetic operations, some registers may be used for storing status information via various flag bits, and other registers may be used for storing results from floating point operations, etc. Modern processors employ out of order execution in order to speed up processing time by executing multiple instructions concurrently. Out of order execution utilizes a technique or mechanism called register renaming to eliminate false dependencies between instructions that are caused by register reuse. Register renaming eliminates the false dependencies by converting references to external (also referred to as logic or architectural) registers into references to internal or physical registers. The basic register renaming mechanism or technique is well known and widely used in modem processors employing out of order execution.
Performing register renaming for the floating point registers in some processor architecture (e.g., the Intel IA 32 architecture) involves additional complexity. For example, the IA32 floating point registers are architecturally accessed as a register stack. Specifically, these floating point registers are referenced by a top of stack (TOS) pointer and are therefore stack relative. In other words, a floating point register is addressed by its location relative to the top of the floating point stack. This top of stack may change from instruction to instruction so it is not straight forward to determine whether two operations use the same architectural or logic register. Therefore, in order to rename a floating point register, the stack relative references are first converted into an absolute register references (referred to as the first renaming phase) and then the traditional renaming of architectural or logic register references into physical register references is performed (referred to as the second renaming phase). The current floating point register renaming is therefore performed sequentially in two phases which does not optimize the out of order execution employed in modem processors.
FIG. 1 illustrates an example of a floating point register renaming mechanism in which the two renaming phases are performed sequentially to convert logic register references into absolute register references first and then convert absolute register references into physical register references. In this example, assuming that the following floating point computation is to be executed:
MEM4=MEM1*(MEM2+MEM3)
As shown in FIG. 1, stn refers to a stack relative register number, fpn refers to an absolute register number, and prn refers to a physical register number. The instructions for this computation in this example are decoded into a number of micro-instructions (also called micro-operations or UOPs). These UOPs are executed with references to the floating point stack. As shown in FIG. 1, the TOS may change from one UOP to another UOP. FIG. 1 illustrates the two renaming phases that are performed in order to convert a reference to a logical floating point register into a reference to a physical floating point register. The two renaming phases are performed sequentially because the results obtained from the first renaming phase (i.e., the logical register to absolute register conversion) are used in the second renaming phase (i.e., the absolute register to physical register conversion). Performing the two renaming phases sequentially does not optimize the system performance because the second renaming phase has to wait for the completion of the first renaming phase. Thus the out of order execution architecture is not fully utilized with respect to the floating point register renaming function.
According to one aspect of the invention, a method is provided in which a current instance of an instruction is received. The current instance of the instruction contains a reference to a logical floating point register. A first rename phase is performed to convert the current""s instance reference to the logical floating point register into a reference to an absolute register. A second rename phase is performed in parallel with the first rename phase to convert the reference to the absolute register into a reference to a physical register, based upon results obtained from performing the first rename phase with respect to a previous instance of the instruction.