1. Field of the Invention
This invention is related to the field of processors and, more particularly, to register renaming mechanisms within processors.
2. Description of the Related Art
Superscalar processors attempt to achieve high performance by dispatching and executing multiple instructions per clock cycle, and by operating at the shortest possible clock cycle time consistent with the design. To the extent that a given processor is successful at dispatching and/or executing multiple instructions per clock cycle, high performance may be realized.
One technique often employed by processors to increase the number of instructions which may be executed concurrently is speculative execution (e.g. executing instructions out of order with respect to the order of execution indicated by the program or executing instructions subsequent to predicted branches). Often, instructions which are immediately subsequent to a particular instruction are dependent upon that particular instruction (i.e. the result of the particular instruction is used by the immediately subsequent instructions). Hence, the immediately subsequent instructions may not be executable concurrently with the particular instruction. However, instructions which are farther subsequent to the particular instruction in program order may not have any dependency upon the particular instruction and may therefore execute concurrently with the particular instruction. Still further, speculative execution of instructions subsequent to mispredicted branches may increase the number of instructions executed concurrently if the branch is predicted correctly.
Out of order execution gives rise to another type of dependency, often referred to as an xe2x80x9cantidependencyxe2x80x9d. Generally, antidependencies occur if an instruction subsequent to a particular instruction updates a register which is either accessed (read) or updated (written) by the particular instruction. The particular instruction must read or write the register prior to the subsequent instruction writing the register for proper operation of the program. Generally, an instruction may have one or more source operands (which are input values to be operated upon by the instructions) which may be stored in memory or in registers. An instruction may also have one or more destinations (which are locations for storing results of executing the instruction) which may also be stored in memory or in registers.
A technique for removing antidependencies between source and destination registers of instructions, and thereby allowing increased out of order execution, is register renaming. In register renaming, a pool of xe2x80x9crename registersxe2x80x9d are implemented by the processor. The pool of rename registers are greater in number than (i) the registers defined by the instruction set architecture employed by the processor (the xe2x80x9carchitected registersxe2x80x9d) and (ii) the registers employed for temporary use, such as by microcode routines (the xe2x80x9ctemporary registersxe2x80x9d). Together, the architected registers and temporary registers are referred to as the xe2x80x9clogical registersxe2x80x9d. The destination register for a particular instruction (i.e. the logical register written with the execution result of the instruction) is xe2x80x9crenamedxe2x80x9d by assigning one of the rename registers to the logical register. The value of the logical register prior to execution of the particular instruction remains stored in the rename register previously assigned to the logical register. If a previous instruction reads the logical register, the previously assigned rename register is read. If a previous instruction writes the logical register, the previously assigned rename register is written. Accordingly, the rename registers may be updated in any order.
Register renaming may also allow speculative update of registers due to instruction execution subsequent to a predicted branch instruction. Previous renames may be maintained until the branch instruction is resolved. If the branch instruction is mispredicted, the previous renames may be used to recover the state of the processor at the mispredicted branch instruction.
In many instruction set architectures, a variety of architected registers are provided for storing instruction results of varying types. For example, integer, floating point, multimedia, and condition code registers may be defined. Integer registers are employed for storing integer values (i.e. whole number values represented by the magnitude of the value stored in the registers). Floating point registers are employed for storing the floating point values (i.e. numbers represented by a sign, exponent, and significand stored in the register). Multimedia registers are used for storing multimedia values (e.g. packed integer or floating values representing audio and video information, operated upon in a single instruction, multiple data (SIMD) fashion). Finally, condition code registers store values which indicate the result of a particular manipulation (e.g. zero, greater than or less than zero, carry out) or comparison (e.g. equal, greater than, less than). Condition codes may also be referred to herein as xe2x80x9cflagsxe2x80x9d.
Each of the various types of registers may have a different size than the others. For example, in the x86 instruction set architecture, floating point registers are 80 bits wide, multimedia registers are 64 bits wide, integer registers are 32 bits wide (and subdivided into independently addressable portions), and the condition codes are stored in an EFLAGS register but comprise 6 bits. Accordingly, processors typically rename each register type separately with register renames of the corresponding size. Unfortunately, rename registers of a particular type may be idle if instructions manipulating that type are not being executed. For example, floating point renames are idle if floating point instructions are not being executed. The total amount of available rename register space may therefore by inefficiently used much of the time.
Furthermore, in the x86 instruction set architecture many integer instructions update both a destination and the condition codes. Therefore, multiple rename registers may need to be assigned to each instruction. Register rename logic complexity may therefore be significant. Accordingly, a more efficient and simpler register rename scheme is desired.
A register renaming apparatus, according to one embodiment, includes one or more rename registers (referred to herein as physical registers) which may be assigned to store any of: a floating point value, a multimedia value, an integer value and corresponding condition codes, or condition codes only. For physical register assignment, an instruction is classified as being floating point (e.g. having a floating point register as a destination), multimedia (e.g. having a multimedia register as a destination), integer (e.g. having an integer register and the flags register as destinations), or a flags-only (e.g. having the flags register as a destination). The classification of the instruction defines which lookahead register state is updated (floating point, integer, flags, etc.), but the physical register can be selected from the one or more physical registers for any of the instruction types. Advantageously, determining which physical register to select may be simplified over an implementation which employs separate sets of physical registers for each data type. For example, part of the register renaming logic is to determine if enough physical registers are free for assignment to the instructions being selected for dispatch. In an implementation employing different physical registers for different data types, this determination includes determining the data type of each instruction (to determine how many physical registers of each type are needed). Instead, the register renaming apparatus described below considers the number of instructions selected for dispatch and the number of free physical registers.
Additionally, an embodiment of the register renaming apparatus described herein may make more efficient use of the physical registers. For example, when a code sequence includes predominately instructions of a particular data type, many of the physical registers may be assigned to that data type. By contrast, if different sets of physical registers are provided for different data types, only the physical registers used for the particular data type may be used for the aforementioned code sequence. The other physical registers sit idle during such code sequences. Performance may be increased due to the more efficient use of the physical registers by allowing more of the instructions of the particular data type to be concurrently outstanding. Still further, additional efficiencies may be realized in embodiments in which an integer register and condition codes are both updated by many instructions (e.g. the x86 instruction set architecture exhibits this feature). Because the physical registers described herein are adaptable to store both an integer value and a condition code value, one physical register may concurrently represent the architected state of both the flags register and the integer register. In embodiments which maintain separate sets of physical registers, two registers are assigned in such cases.
Broadly speaking, an apparatus for performing register renaming is contemplated. The apparatus comprises a physical register and a map unit. The map unit is configured to assign the physical register to store a floating point value during a first clock cycle. Additionally, the map unit is configured to assign the physical register to store an integer value and a corresponding condition code during a second clock cycle.
Additionally, a method for performing register renaming is contemplated. A physical register is assigned to store a floating point value responsive to dispatching a floating point instruction. The physical register is assigned to store an integer value and a corresponding condition code responsive to dispatching an integer instruction.
Moreover, a processor is contemplated. The processor comprises an instruction cache, a register file, and a map unit. The instruction cache is configured to store a plurality of instructions. The processor is configured to fetch the plurality of instructions from the instruction cache. The register file comprises physical registers. Coupled to receive the plurality of instructions from the instruction cache, the map unit is configured to assign one of the physical registers within the register file to one of the plurality of instructions upon dispatch of the plurality of instructions to the map unit. The one of the physical registers is adaptable to store a floating point value if the one of the plurality of instructions is a floating point instruction. Additionally, the one of the physical registers is adaptable to store an integer value and a corresponding flags value if the one of the plurality of instructions is an integer instruction.
Still further, a register renaming apparatus is contemplated. The register renaming apparatus comprises a physical register and a map unit. The map unit is configured to assign the physical register to a first logical register of a first data type specified as a destination of a first instruction during a first clock cycle. Additionally, the map unit is configured to free the physical register during a second clock cycle in which a second instruction subsequent to the first instruction is retired and the second instruction has the first logical register of the first data type as a destination. The map unit is configured to assign the physical register to a second logical register of a second data type different than the first data type during a third clock cycle subsequent to the second clock cycle.
A method for performing register renaming is contemplated. A physical register is assigned to a first logical register of a first data type. The first logical register is specified as a destination of a first instruction. A second instruction subsequent to the first instruction in program order is retired. Responsive to the retiring, the physical register is freed. The physical register is assigned to a second logical register of a second data type different than the first data type subsequent to being freed.