Computer programmers arrange the instructions within a computer program in a particular order, commonly referred to as program order. The programmer relies upon the processor executing the program to follow certain rules about how it performs the instructions of the program based on the program order. For a first example, assume instruction A is followed by instruction B in program order, and assume that instruction A writes to a register of the processor and instruction B reads from the same register. In this case, the programmer relies upon the processor to execute instruction B using the value written by instruction A rather than the value that was in the register prior to instruction A writing its value to the register. For a second example, assume this time that instruction A reads from the register and instruction B writes to the register. In this case, the programmer relies upon the processor to execute instruction A using the value that was in the register prior to instruction B writing its value to the register. For a third example, assume this time that both instruction A and instruction B write to the register, instruction C follows instruction B in program order, and instruction C reads the register. In this case, the programmer relies upon the processor to execute instruction C using the value written by instruction B rather than the value written by instruction A.
One way for a processor to follow the rules regarding program order discussed above is to simply execute the instructions in program order. However, many modern microprocessors, particularly pipelined superscalar microprocessors that include multiple execution units to which multiple instructions may be issued in a single clock cycle, realize performance improvements by executing instructions out-of-order, i.e., out of program order. Out-of-order execution is particularly beneficial in situations where certain instructions in the instruction stream take a long time to execute, commonly referred to as long latency instructions, such as floating point instructions or instructions that read from memory. When an in-order execution microprocessor encounters a long latency instruction, the execution units may sit idle for many timeslots—in some cases on the order of one hundred—waiting for the long latency instruction to complete. However, an out-of-order execution microprocessor attempts to find instructions that the execution units may execute while waiting for the long latency instruction to complete. These instructions are commonly referred to as independent instructions because they may be executed out of program order with respect to the long latency instruction without violating any of the rules associated with the program order, such as the three discussed above. In contrast, the out-of-order execution microprocessor must wait to execute instructions that are dependent upon any instruction that appears earlier in program order, such as the long latency instruction. Thus, it may be seen that the efficient utilization of the multiple execution units of an out-of-order execution superscalar pipelined microprocessor may be limited by the number of independent instructions that the microprocessor can find in the program's instruction stream.
One well-known technique employed by out-of-order execution superscalar pipelined microprocessors to increase the amount of independent instructions in the instruction stream is register renaming. In particular, register renaming may help instruction A and instruction B in the second and third examples above to be independent of one another such that the microprocessor may execute them out-of-order. Microprocessors include architectural registers, i.e., the registers that program instructions specify as the source of their operands or the destination of their results. For example, integer architectural registers of an x86 architecture microprocessor include the EAX, EBX, ECX, EDX, ESI, EDI, ESP, and EBP registers, among others. A microprocessor that employs register renaming includes a larger number of physical registers than the number of architectural registers. For example, an x86 processor whose architecture specifies the eight integer registers mentioned above might have 32 physical registers to which the eight architectural registers may be renamed. When the processor encounters an instruction that specifies one of the architectural registers as its destination register, renaming hardware “renames” the architectural register to one of the physical registers. When the processor executes the instruction to generate its result, the processor writes the result to the physical register. Furthermore, assume an instruction specifies one of the architectural registers as a source of an operand. The renaming hardware determines the instruction upon which the instant instruction depends, which is the newest instruction in program order that will write a result to the specified source architectural register but that is older than the instant instruction. The renaming hardware will then cause the instant instruction to refer not to the architectural register, but instead to the physical register to which the architectural register was renamed for the instruction upon which the instant instruction depends. This causes the instant instruction to receive its source operands from the appropriate renamed physical registers.
However, the improvement in performance obtained by register renaming may come at a significant cost in terms of hardware die space, power, and complexity. It is well known that this is true in many register renaming processors. Therefore, what is needed is a solution that provides a good balance to the performance/cost conflict in a superscalar out-of-order execution pipelined microprocessor.