1. Technical Field
This disclosure generally relates to the field of processor architecture and, more particularly, to register renaming in superscalar processors.
2. Description of the Related Art
In general, a processor is a device that can execute computer programs to carry out algorithmic computation, data permutation, etc. Microprocessors are a type of processor that incorporates most or all of the functions of a processor on a single integrated circuit. Superscalar microprocessors are microprocessors that can execute more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to redundant execution resources, also known as functional units, in the processor. When executing instructions and micro-operations, processors typically read source operands from registers and store result or destination operands in registers. Registers are temporary storage units within the processor whose contents can be accessed more quickly than storage available elsewhere. Registers are typically used for holding arithmetic and other results used and generated by the processor. A given register contains a number of bits, e.g., 1 bit, 8 bits, 16 bits, or 32 bits.
A given register is typically addressable by a respective register identifier, such as a register number, an address, an offset, or in some other like manner. The respective register identifier is used in a program to identify a particular architectural register. That is, the given architectural register is a programming convention that virtually identifies or represents an underlying physical storage space such as a physical register.
Among the various techniques utilized in superscalar processors to allow parallel executions of instructions is register renaming. Because a program being executed by the processor often specifies fewer registers than can be implemented in hardware, a given superscalar processor implementation often has more physical registers than the number of architectural registers specified in the program. That is, in a superscalar processor implementation, there is not necessarily a one-to-one correspondence between an architectural register and a physical register.
In what is typically known as a register renaming stage, a number of general-purpose architectural registers used by a software program are correlated, or mapped, to a number of physical registers in the superscalar processor. For instance, in a superscalar processor that can issue up to four instructions for execution in parallel, up to four empty physical registers in a physical register file are available so that up to four architectural registers can be renamed every clock cycle.
The relationship between the correlated architectural registers and the corresponding physical registers is typically recorded in entries of a physical register mapping table (PRMT). Each entry of the PRMT records the state of a respective physical register in the physical register file, e.g., whether or not the respective physical register is empty and hence is available to store data for a correlated architectural register.
The architectural registers used by the decoded instructions of the program are correlated to respective physical registers, and the correlations are recorded in the PRMT. An identifier, e.g., an address, of each architectural register is also recorded in the PRMT. The identifier is typically recorded at the entry of the PRMT associated with the correlated physical register. The PRMT also records the state of each of the physical registers as well as the architectural register-to-physical register correlation/mapping information.
The state of each physical register that is allocated to store data for a correlated architectural register changes from one clock cycle to the next as program execution proceeds. The change in the state of the allocated physical register is tracked in the PRMT. The allocated physical register cannot be re-allocated to another architectural register until the current architectural register that the physical register is correlated to is released by the program instruction.
The use of a larger physical register file with more physical registers, such as an eighty-entry register file, is becoming more prevalent in superscalar processors. The use of a large number of physical registers helps reduce the occurrence of pipeline stall. In in some cases, pipeline stall is due to an instruction dependency encountered after the processor has permitted multiple instructions to be issued at a time. Accordingly, the number of entries in a PRMT increases as the number of physical registers increases.
In superscalar processors, a larger PRMT makes it more difficult to search and find entries indicating that the associated physical register is empty. The search and find algorithms typically take more time, logic, and energy with a larger PRMT than with a smaller PRMT. Additionally, implementing a larger PRMT has other challenges. For example, implementing a larger PRMT with traditional application-specific integrated circuit (ASIC) design methods and structures tends to require larger area for the circuits, increase path delay, and result in higher power consumption. Since path delay and power consumption are two factors to consider in processor design, and since both path delay and power affect performance, it is desirable to implement a larger PRMT for register renaming with minimal impact on performance.