Register renaming is believed to affect the performance of dynamically scheduled processors or out-of-order execution processors. A dynamically scheduled processor is able to execute instructions out-of-order, which should result in fewer constraints on the issue order of instructions than for an in-order execution processor, and which should provide higher instruction-level parallelism (ILP). A dynamically scheduled processor should obey instruction dependencies, however, and therefore may not have complete freedom to select the execution order of instructions. These dependencies may include data dependencies (which may occur when one instruction produces a value that is used by another instruction), name dependencies (which may be caused by reusing storage locations, such as, for example, registers and memory), and control dependencies (which may be associated with conditional branches).
Dynamic register renaming may be used to eliminate name dependencies through registers by providing multiple storage locations for the same register name and keeping track of which storage location is referred to by each different instance of the same name. In this context, the name of a register may be referred to as a logical register, and the physical location to which it is mapped at a particular time may be referred to as a physical register. The amount of physical storage available for register renaming determines the maximum number of simultaneously “live” values, and therefore may limit the instruction window size.
Register renaming may, however, limit performance in microprocessors. In particular, priority or prioritized content-addressable-memories (“CAM”) may be used to rename several instructions in a cycle, where each instruction may access results renamed in previous cycles or may access results of previous instructions renamed in the same cycle. Register renaming may be used by out-of-order processors to handle write-after-write (“WAW”) and write-after-read (“WAR”) problems to the extent that they may occur, and may be used for providing recovery in case of branch misprediction or incorrect speculation. A scheduler may use register renaming information to perform the task of instructions scheduling. The register renaming of a number N of instructions simultaneously may include four (4) steps as follows:
First, the system allocated “n” destination registers (which may be referred to as Physical Destinations or PDsts) for each instruction in the rename window. The Physical Destination Registers (PDsts) may or may not be allocated sequentially. In this regard, for example, sequential allocation of the Physical Destination Registers (PDsts) may not be performed when a common register file is used for both committed and non-committed results (which may be problematic, for example, for replay machines having separate register files for committed and non-committed results). Next, the system determines the dependency chain between these n instructions. If the instruction does not depend on any previous instructions in the rename window, the physical source register (“PSrc”) is assigned according to the logical source register (“LSrc”) from a register alias table (“RAT”); otherwise, the physical source register (PSrc) is assigned as the physical destination as the Physical Destination Register (PDst) of the instruction upon which the renamed instruction depends. Finally, the register alias table (RAT) may be updated according to the mapping of the registers.
This algorithmic approach may, however, limit the clock speed when a sufficient number of instructions (more than about 3 or 4) are being renamed in a single clock cycle. In particular, in the renaming algorithm, the second and fourth steps may use prioritized content-addressable-memories (CAMs) for determining the dependency chain and for updating the register alias table (RAT). Since prioritized content-addressable-memories (CAMs) may operate sequentially, it is believed that they may substantially slow down or at least be negatively affected as the number of instructions being renamed increases.
In out-of-order execution processors, instructions may be committed in-order, and instructions (after being decoded) may be retained in the instruction re-order buffer (ROB) until they are committed. The size of the re-order buffer (ROB) determines, or at least affects, the maximum number of “in-flight” instructions or instruction window. That is, the size of the re-order buffer (ROB) corresponds to or is the size of the instruction window. In short, the instruction window may be defined as the set of instructions from the oldest uncommitted instruction to the latest decoded instruction.
Register renaming may be used to remove name dependencies through registers, and this may be done by allocating a free storage location for the destination register of every new decoded instruction. Thus, different physical destination registers may be allocated even if the architectural name is the same. One renaming approach involves the entries of the re-order buffer (ROB), in which the result of every instruction is kept in the re-order buffer (ROB) until it is committed, after which it is written in the register file. When an instruction is decoded, the available source operands are read either from the register file or from a re-order buffer (ROB) entry. Operands that are not ready at decode may be forwarded from the execution units to the corresponding instruction queue entries (reservation stations) when they are produced. When an instruction is committed, its result may be copied from the re-order buffer (ROB) to the “real” register file. In another variation, a register buffer may be used just for renaming.
Another renaming approach may use a physical register file that contains more registers than are defined in the instruction set architecture (ISA), and these registers may be referred to as logical registers. In the decode stage, each logical register may be mapped to a physical register using a map table. The destination register may be mapped to a free physical register, and source registers may be translated to their last assigned mapping. When an instruction is committed or retired, the physical register allocated by the previous instruction with the same logical destination register becomes free. In this approach, since it should eliminate any need to copy registers on retirement, the operands may be read from the physical register file, which may be more efficient than for the re-order buffer (ROB) entry approach. The approach involves having one “pull” of information from the architectural and/or physical registers, and the retirement or “commitment” changes the allocation map to reflect it.
Additionally, to take advantage of a particular instruction window size in the physical register file organization approach, a number of physical registers should be about the same as the number of logical registers and the window size, since most or at least a significant number of the instructions may have a destination register. (That is, the maximum number of physical registers should be equal to the number of architectural plus the size of the instruction window, but a lower number may be used with virtual renaming. The logical register has to be “live” in case of branch misprediction. The rest are for maintaining “in-flight values”.) It is believed that this is because each logical register may be mapped to a physical register when the instruction window is empty (such as may occur, for example, after a branch mis-prediction). Thus, the minimum number of physical registers that are used is at least the same as the number of logical registers. In addition, for every instruction whose destination operand is a register, an additional register may be allocated when it enters the window (decode stage) and a physical register may be released when it leaves the window (commit stage).
As regards all of the above, it is not believed that any of these systems, as well as virtual renamers alone, reflect the advantages, apparatuses, arrangements, methods, structures or topologies of the exemplary embodiments and methods of the present inventions, as are described below in the context of and with the use of virtual renamer apparatuses, arrangements, methods, structures or topologies.