1. Field of the Invention
The present invention relates to the design of processors within computer systems. More specifically, the present invention relates to a method and an apparatus for dynamically allocating physical registers in a windowed processor architecture.
2. Related Art
Computer systems typically perform computational operations on data values stored in a set of processor registers. Because each function within a program operates on its own set of registers, a processor's “active register set” changes each time the current function changes, for example during a function call operation or a function return operation. This change can involve saving the current register set to memory during a function call operation to make room for a register set for the new function, and subsequently restoring the current register set from memory during a corresponding function return operation. Unfortunately, this process of saving and restoring register sets to memory is extremely time-consuming and can significantly degrade computer system performance.
Modern computer architectures sometimes make use of a large set of registers to reduce the time required to perform save and restore operations. This large set of registers is typically divided into a number of “register windows,” wherein each register window contains the register set for a different function on the call stack. This makes it possible to simply switch between register windows during a function call operation, instead of having to save and restore registers to memory.
For example, the SPARC™ V9 instruction set defines an integer general-purpose architecture comprised of an implementation-dependent number of overlapped register windows. At any time, only a small fraction of the register windows is visible to application software. Compared to other reduced instruction set computer (RISC) architectures that implement a flat register space of typically 32 integer registers, SPARC™ register windows decrease the overhead of register spilling and filling as described below. (SPARC™ is a registered trademark of SPARC International, Inc.™ of San Jose, Calif.) Note that the SPARC™ architecture is described for exemplary purposes only. This description in no way limits the present invention to the SPARC™ architecture.
Upon encountering a procedure call, SPARC™ code typically moves to a new register window using a save instruction. Conversely, on a procedure return, the SPARC™ code typically returns to the previous register window through a restore instruction. In contrast, on a typical RISC architecture, the registers are spilled by saving them to the stack in memory upon encountering the procedure call and restored from the stack through register fills on a procedure return. The overhead of register spilling and filling can be significant due to the additional load/stores executed, the additional memory traffic, and the expansion in the code footprint.
On the other hand, the primary disadvantage of register windows is that the register windows require a larger physical register file, which is slower and may be in the critical timing path of the processor pipeline. Register file access times do not scale as well as logic delays with improvements in process technology. Therefore, the large physical register file size required by register windows increasingly limits the processor frequency. As a result, the newer processor chips provide space for fewer register windows than in previous generation processor chips. Unfortunately, this reduced number of windows increases the number of costly window spills and restores. A register window is spilled to memory when the application attempts to use more than the number of windows supported by a processor implementation. Spilling a register window generally involves copying the registers used by an ancestor routine to a special buffer or software stack. The spilled register window may then be used for the current routine or procedures. Eventually, when the processor returns to the ancestor routine whose window was spilled, the processor restores or fills the window by copying an area in a special buffer or software stack back to the registers.
Furthermore, recent trends in object-oriented software programming have lead to applications with small procedures and deep runtime call graphs. Smaller functions may not fully utilize all the registers in a register window. Moreover, deep call graphs require frequent spilling of entire register windows when only a fraction of the registers actually contain live values. In order to support a reasonable number of register windows, say five, while still allowing for competitive processor clock frequencies, it is desirable to improve the implementation of register windows in future processor designs.
Note that modern processors typically implement out-of-order execution to enhance throughput. Out-of-order processors use register renaming to eliminate anti (write-after-read) and output (write-after-write) dependencies between instructions by allocating a fresh physical register on each definition of an architectural register encountered in the dynamic program order. Since multiple definitions of the same architectural register are written to distinct physical registers, the processor may reorder these multiple definitions without affecting the final outcome of the instructions.
In addition to a large physical register file, a common implementation of renaming uses a rename map that associates each architectural register with a corresponding physical register identifier. Furthermore, two other first-in-first-out (FIFO) structures are used to maintain the free list and pending list. A free list contains a list of physical register identifiers that are available to be assigned and a pending list contains a list of physical register identifiers that may be freed once all instructions that use (read) this physical register have retired. Typically, a physical register identifier is moved to the pending list upon a new definition of its associated architectural register. A physical register may be moved to the free list from the pending list once all instructions that read from that physical register have retired.
In a processor that implements register windows, the in-order rename stage in the front end first flattens all resister specifications by adding an offset based on the current window pointer. The current window pointer points to the window or set of windows currently accessible to application software. The current window pointer is typically updated by save and restore instructions. The rename stage uses the flattened architectural register identifier to look up the corresponding physical register in the rename map for each source register. For a destination register of an instruction, the rename stage then assigns a new physical register and records the assignment in the rename map. During this process, the rename stage removes the new physical register from the free list and pushes the previously assigned physical register onto the pending list. From this point onwards, instructions may be executed in an out-of-order fashion without regard to the original “anti” and “output” dependencies. The retire stage also operates in an in-order fashion recovering the resources that were allocated to an instruction. In particular, register identifiers may be moved from the pending list to the free list by the retire stage.
In some processor implementations, the physical register file is partitioned into an architectural register file and a working register file containing rename registers. In the rename stage, destination registers are allocated rename registers and these rename registers are copied back into the architectural register at retire. In alternative implementations, the physical register file is not partitioned and the retire unit does not have to copy a working register to an architectural register. In either type of implementation, during a save operation, the processor ensures that an entire window, of say 16 registers, is allocated, even if the function generates only 4 live registers in the window as is typical. Thus, a large fraction of the critical physical register file contains dead or unused registers. These dead registers are recovered by a subsequent restore instruction, but in the intervening period the window may potentially be spilled/filled multiple times. Hence, including these dead registers in the fill/spill operations unnecessarily increases the overhead of the spill and fill operations and thereby decreases throughput of the processor.
What is needed is a method and an apparatus that facilitates dynamically allocating physical registers in a windowed architecture without the problems described above.