Modern processors employed in computer systems use various techniques to improve their performance; one of these techniques is multithreading. A multithreaded computer system contains hardware support for multiple threads of execution. The threads can be independent programs or related execution streams of a single parallel program, or both. The hardware typically supports (1) rapid switching of threads on long-latency operations, i.e., "coarse-grained or block multithreading," (2) rapid switching of threads on a cycle by cycle basis, i.e., "fine-grained multithreading," or (3) scheduling of instructions from multiple threads within a single cycle, i.e., "simultaneous multithreading." To support this rapid switching or simultaneous instruction selection, the processor must contain register sets for multiple thread contexts. The size of the register file is thus an important consideration in the design of such processors.
A second crucial technique typically used to improve the performance of a modern processor is dynamic instruction scheduling in which processor hardware can execute instructions out of order, i.e., in an order different than that specified by the programmer or compiler. The hardware can allow out-of-order execution as long as it ensures that the results of the computation are identical to the specified in-order execution. To enable this technique to achieve performance improvement, the hardware provides a set of physical registers called "renaming registers," which are in addition to the "architectural registers" that are visible to the programmer.
The renaming registers permit more parallelism, because they allow the hardware to allocate a new renaming register to represent an architectural register when the processor detects the start of a new definition of that architectural register; i.e., when hardware detects the loading of a new value into a register. By using a new renaming register to represent this redefinition of the architectural register, a new stream of execution can begin in parallel with the use of the original register.
It is believed that multithreading and out-of-order execution of instructions have not yet been combined in a single processor. The combination of multithreading and dynamic out-of-order execution techniques are expected to cause a substantial increase in the number of physical registers needed by a processor.
Large register files are a concern for both multithreaded architectures and processors with register windows, as evidenced by the following prior art references. In a paper entitled "Register Relocation: Flexible Contexts for Multithreading," 20.sup.th Annual International Symposium on Computer Architecture, pages 120-129, May 1993, C. A. Waldspurger and W. E. Weihl proposed compiler and runtime support for managing multiple register sets in the register file. The compiler tries to identify an optimum number of registers for each thread, and generates code using that number of registers. The runtime system then tries to dynamically pack the register sets from all active threads into the register file. Also, in a paper entitled, "The Named-State Register File: Implementation and Performance,"1.sup.st Annual International Symposium on High-Performance Computer Architecture, January 1995, P. R. Nuth and W. J. Dally proposed the named state register file as a cache for register values. The full register name space is backed by memory, but active registers are dynamically mapped to a small, fast set of registers. This design exploits both the small number of simultaneously active registers and the locality characteristics of register values. For its SPARC.TM. processor with register windows, Sun Corporation designed 3-D register files to reduce the required chip area, as described by M. Tremblay, B. Joy, and K. Shin in "A Three Dimensional Register File for Superscalar Processors," Hawaii International Conference on System Sciences, pages 191-201, January 1995. Because only one register window can be active at any time, the density of the register file can be increased by overlaying multiple register cells so that they share wires.
Several papers have investigated register lifetimes and other register issues. For example, in "Register File Design Considerations in Dynamically Scheduled Processors," 2.sup.nd Annual International Symposium on High-Performance Computer Architecture, January 1996, K. I. Farkas, N. P. Jouppi, and P. Chow compared the register file requirements for precise and imprecise interrupts and their effects on the number of registers needed to support parallelism in an out-of-order machine. They also characterized the lifetime of register values, by identifying the number of live register values present in various stages of the renaming process, and investigated cycle time tradeoffs for multi-ported register files.
J. Janssen and H. Corporaal, in "Partitioned Register Files for TTAs," 28.sup.th International Symposium on Microarchitecture, pages 303-312, December 1995, A. Capitanio et al. in "Partitioned Register Files for VLIWs," 25.sup.th International Symposium on Microarchitecture, pages 292-300, December 1992, and J. Llosa et al., in "Non-Consistent Dual Register Files to Reduce Register Pressure," 1.sup.st Annual International Symposium on High-Performance Computer Architecture, pages 22-31, January 1995, investigated techniques for handing large register files, including partitioning, limited connectivity, and replication. Kiyohara et al., in "Register Connections: A New Approach to Adding Registers into Instruction Set Architecture," 20.sup.th Annual International Symposium on Computer Architecture, pages 247-256, May 1993, proposed a technique for handling larger register files by adding new opcodes to address the extended register file.
Based upon the preceding prior art references, it will be apparent that a flexible approach is needed for sharing physical registers among threads in such a way as to reduce the total register requirement. The approach used should improve the performance of a given number of registers, reduce the number of physical registers required to support a given number of threads with a given level of performance, and simplify the processor organization. Currently, the prior art does not disclose or suggest such an approach.