Modern processors use various techniques to improve their performance. One crucial technique is dynamic instruction scheduling, in which processor hardware can execute instructions out of order, i.e., in an order different than that specified by the programmer or compiler. The hardware can allow out-of-order execution as long as it ensures that the results of the computation are identical to the specified in-order execution. To enable this technique to achieve performance improvement, some hardware implementations provide a set of physical registers, called "renaming registers", which are in addition to the "architectural registers" visible to the programmer.
The renaming registers permit more parallelism, because they allow the hardware to allocate a new renaming register to represent an architectural register when the processor detects the start of a new definition of that architectural register; i.e., when hardware detects a new load into a register. By using a new renaming register to represent this redefinition of the architectural register, a new stream of execution can begin in parallel with the use of the original register.
A physical renaming register backing an architectural register can be "freed" (i.e., disassociated with that architectural register and made available for reallocation to another architectural register) when all instructions that read the old value in the architectural register (which is stored in that physical register) have completed. Hardware detection of these conditions is by its nature overly conservative, that is, the hardware typically maintains the association between a physical renaming register and an architectural register for a longer period than required. Thus, dynamic out-of-order execution techniques are expected to cause a substantial increase in the number of physical registers needed by a processor.
Large register files are a concern for both multithreaded architectures and processors with register windows, as evidenced by the following prior art references. In a paper entitled "Register Relocation: Flexible Contexts for Multithreading," 20.sup.th Annual International Symposium on Computer Architecture, pages 120-129, May 1993, C. A. Waldspurger and W. E. Weihi proposed compiler and runtime support for managing multiple register sets in the register file. The compiler tries to identify an optimum number of registers for each thread, and generates code using that number of registers. The runtime system then tries to dynamically pack the register sets from all active, threads into the register file. Also, in a paper entitled, "The Named-State Register File: Implementation and Performance," 1.sup.st Annual International Symposium on High-Performance Computer Architecture, January 1995, P. R. Nuth and W. J. Dally proposed the named state register file as a cache for register values. The full register name space is backed by memory, but active registers are dynamically mapped to a small, fast set of registers. This design exploits both the small number of simultaneously active registers and the locality characteristics of register values. For its SPARC.TM. processor with register windows, Sun Corporation designed 3-D register files to reduce the required chip area, as described by M. Tremblay, B. Joy, and K. Shin in "A Three Dimensional Register File for Superscalar Processors," Hawaii International Conference on System Sciences, pages 191-201, January 1995. Because only one register window can be active at any time, the density of the register file can be increased by overlaying multiple register cells so that they share wires.
Several papers have investigated register lifetimes and other register issues. For example, in "Register File Design Considerations in Dynamically Scheduled Processors," 2.sup.nd Annual International Symposium on High-Performance Computer Architecture, January 1996, K. I. Farkas, N. P. Jouppi, and P. Chow compared the register file requirements for precise and imprecise interrupts and their effects on the number of registers needed to support parallelism in an out-of-order machine. They also characterized the lifetime of register values, by identifying the number of live register values present in various stages of the renaming process, and investigated cycle time trade-offs for multi-ported register files.
In "Register Traffic Analysis for Streamlining Inter-Operation Communication in Fine-Grained Parallel Processors," 25.sup.th International Symposium on Microarchitecture, pages 236-245, December 1992, M. Franklin and G. Sohi, and in "Exploiting Short-Lived Variables in Superscalar Processors," 28.sup.th International Symposium on Microarchitecture, pages 292-302, December 1995, C. L. Lozano and G. Gao noted that register values have short lifetimes, and often do not need to be committed to the register file. Both papers proposed compiler support to identify last uses and architectural mechanisms to allow the hardware to ignore writes to reduce register file traffic and the number of write ports. Franklin and Sohi also discussed the merits of a distributed register file in the context of a multiscalar architecture.
E. Sprangle and Y. Patt, in "Facilitating Superscalar Processing via a Combined Static/Dynamic Register Renaming Scheme," 27.sup.th International Symposium on Microarchitecture, pages 143-147, December 1994, proposed a statically-defined tag ISA that exposes register renaming to the compiler and relies on basic blocks as the atomic units of work. The register file is split into two, with the smaller file being used for storing basic block effects, and the larger for handling values that are live across basic block boundaries. In "A Restartable Architecture Using Queues," 14.sup.th Annual International Symposium on Computer Architecture, pages 290-299, June 1987, A. R. Pleszkun et al. expose the reorder buffer to the compiler, so that it can generate better code schedules and provide speculative execution.
J. Janssen and H. Corporaal, in "Partitioned Register Files for TTAs," 28.sup.th International Symposium on Microarchitecture, pages 303-312, December 1995, A. Capitanio et al. in "Partitioned Register Files for VLIWs," 25.sup.th International Symposium on Microarchitecture, pages 292-300, December 1992, and J. Llosa et al., in "Non-Consistent Dual Register Files to Reduce Register Pressure," 1.sup.st Annual International Symposium on High-Performance Computer Architecture, pages 22-31, January 1995 investigated techniques for handing large register files, including partitioning, limited connectivity, and replication. Kiyohara et al., in "Register Connections: A New Approach to Adding Registers into Instruction Set Architecture," 20.sup.th Annual International Symposium on Computer Architecture, pages 247-256, May 1993, proposed a technique for handling larger register files by adding new opcodes to address the extended register file.
Based upon the preceding prior art references, it will be apparent that a more flexible approach is needed for sharing physical registers among out-of-order instructions in such a way as to reduce the total register requirement for a processor. The approach used should improve the performance of a given number of registers, reduce the number of registers required to support a given number of instructions with a given level of performance, and simplify the organization of the processor. Currently, the prior art does not disclose or suggest such an approach.