As the performance of microprocessors and processing systems has continued to advance, more aggressive compiler optimization techniques have been employed and the corresponding number of registers required to hold all the pertinent information associated with a program state of a particular application/computation/function has dramatically increased.
FIG. 1 shows a microprocessor computer system in accordance with the present invention. As shown in FIG. 1, microprocessor computer system 100 can be represented as a collection of interacting functional units. These functional units perform the functions of fetching instructions and loading data from memory 107 into microprocessor registers 111, executing the instructions, placing the results of the executed instructions into microprocessor registers 111, storing the register results in memory 107, managing these memory transactions, and interfacing with external circuitry and devices. For the purposes of this discussion, a register is small, high-speed computer circuit or buffer that holds values of internal operations, such as the results of computations, and the sources and sinks of data processed by the execution stages.
Microprocessor computer system 100 further comprises an address/data bus 101 for communicating information, microprocessor 102 coupled with bus 101 through input/output (I/O) device 103 for processing data and executing instructions, and memory system 104 coupled with bus 101 for storing information and instructions for microprocessor 102. Memory system 104 comprises, for example, cache memory 105 and main memory 107.
The particular components chosen to be integrated into a single housing is based upon market and design choices. Accordingly, it is expressly understood that fewer or more devices may be incorporated within the housing suggested by dashed line 108.
FIG. 2 shows a one particular example of microprocessor registers 201 to 232 in a register window 211. FIG. 2 shows 32 registers 201 to 232 as seen in one example of one embodiment of a particular microprocessor register window 211. In FIG. 2, registers 201 to 224 are registers that are dependent on the specific register window and change from register window to register window. On the other hand, registers 25 to 32 are global registers-that are independent of the specific register window and do not change from register window to register window. Registers, such as microprocessor registers 201 to 232, and register structures, such as register window 211, and their manipulation are well known to those of skill in the art. Therefore, a more detailed discussion of registers and register structures is omitted here to avoid detracting from the invention. For a more detailed discussion of one particular embodiment of a register structure, the reader is directed to “The SPARC Architecture Manual Version 9” edited by David Weaver and Tom Germond, 1994, published by PTR Prentice Hall, ISBN 0-13-825001-4 which is incorporated herein, in its entirety, by reference and is available at website:
http://www.sparc.com/standards/SPARCV9.pdf.
Chapters 5 and 6 are particularly relevant.
FIG. 3 shows one embodiment of microprocessor registers, such as registers 111 from FIG. 1, conceptually arranged in a register window system 300 with overlapping register windows 211, 303 and 305 as would be employed in any register window architecture. As shown in FIG. 3, in one embodiment, each register window 211 and 303 includes: I “in” registers; L “local” registers; O “out” registers and G “global” registers. Consequently, register window 211 includes: I1 in registers; L1 local registers; O1 out registers and G global registers. Likewise, register window 303 includes: I2 in registers; L2 local registers; O2 out registers and G global registers.
Also shown in FIG. 3 is the fact that register windows 211, 303 and 305 overlap such that the “out” registers O1 of register window 211 are the “in” registers I2 of register window 303 and “out” registers O2 of register window 303 are the “in” registers 13 of register window 305. Windows 211 and 303 are overlapped to allow data to be passed between windows 211, 303 and 305, in order to facilitate parameter-passing function calls.
In this example, I1, L1, O1, O2, L2, O2, I3, L3, O3, and G each contain eight registers such that each register window 211, 303 and 305 comprises 24 registers, such as registers 201 to 224 of register window 211 of FIGS. 1 and 2. Those of skill in the art will readily recognize that while three register windows 211, 303 and 305 are shown in FIG. 3, any number of register windows can be employed depending certain constraints. In addition, while the illustrative register window 211 shown in FIG. 2 has 32 registers (twenty-four window registers and eight global registers), register windows 211, 303 and 305 could include any desired number of individual registers, arranged as desired, that is consistent with the particular “Instruction Set Architecture” (ISA) to which the present invention is being deployed. Consequently, the size and number of registers and register windows shown in the FIG.s is chosen for illustrative purposes only and should not be read to limit the spirit or scope of the invention.
As discussed above, in a register window architecture, such as register window system 300, the size of each register window 211, 303 and 305 is set at a specific number of registers, in this example registers 201 to 232. In a register window architecture, such as register window system 300, programs typically move between register windows 211, 303 and 305 on function calls and each function call receives a new set of registers. Those of skill in the art will further recognize that while it is common to change register windows, such as register windows 211, 303 and 305, based on function call boundaries, register windows can be changed arbitrarily based on the needs of the application. However, in the prior art, at any one time, a specific application can access only one register window 211, 303 or 305, and only the number of registers 201 to 232 from a single register window 211, 303, or 305. That is to say, in the prior art, a given application had access to one and only one register window 211, 303, or 305 and therefore was allotted only the number of registers, 201 to 232 for example, of one register window 211, 303, or 305 before a spill to memory, 105, 107 or 117 in FIG. 1, was required. In the specific embodiment shown in FIG. 3, and using register window 211 as an example, the number of registers 201 to 232 available to a given application includes: registers 201 to 208 shared with the previous register window (not shown); local registers 209 to 216 for temporary use in the current register window 211; registers 217 to 224 for use as outputs to the following window 303; and global registers 225 to 232.
As noted above, many modern microprocessors typically physically support (are available in hardware) multiple register windows 211, 303, and 305, in some cases as many as eight or more register windows 211, 303, 305, so that moving to a different register window 211, 303 and 305 on a function call for a new function/computation does not result in extra time, i.e., cycles, generating spills to memory such as memory 105, 107 or 117 in FIG. 1. In these systems, the register window 211, 303, or 305 that is visible to an application is typically determined, or marked, by a single window pointer that is often itself a register. As one specific example, the window pointer is a register designated a “Current Window Pointer” (CWP). One embodiment of a prior art window pointer 401 (PAWP 401) that is a register is shown in FIG. 4.
Window pointers, such as PAWP 401, and their use and manipulation are well known to those of skill in the art. Therefore, a more detailed discussion of window pointers is omitted here to avoid detracting from the invention. For a more detailed discussion of one particular embodiment of a window pointer, and the CWP in particular, the reader is directed to “The SPARC Architecture Manual Version 9” edited by David Weaver and Tom Germond, 1994, published by PTR Prentice Hall, ISBN 0-13-825001-4 which is incorporated herein, in its entirety, by reference and is available at website:
http://www.sparc.com/standards/SPARCV9.pdf.
Chapters 5 and 6 are particularly relevant.
Typically, PAWP 401, which indicates the current register window context, meaning the current architecturally visible register window, is incremented when entering a new function and decremented when returning from a function. In general, particular Instruction Set Architectures (ISAs) have specific instructions to manipulate PAWP 401. For instance, in the SPARC architecture ISA, a SAVE instruction is used for incrementing PAWP 401 and a RESTORE instruction is used for decrementing PAWP 401. Consequently, referring to FIG. 3, if PAWP 401, in one embodiment CWP, is set to register window 305, then the function is restricted to register window 305 and on a SAVE instruction the value of PAWP 401, in one embodiment the value of CWP, is incremented, which in this example is the next or following register window (not shown in FIG. 3). On a RESTORE instruction, the value of PAWP 401, in one embodiment the value of CWP, is decremented, in this example back to register window 305. Consequently, on a RESTORE, any data in the next register window after register window 305 (not shown) is considered invalid, is not saved on a context switch, and is unavailable to the function.
Register windows 211, 303 and 305 are typically accessed in a circular fashion such that all arithmetic operations on PAWP 401 are modulo the number of physical register windows 211, 303 and 305 supported by the microprocessor. Once all the physical register windows 211, 303 and 305 are used, PAWP 401 wraps-around and the original contents of the first register window (211 in FIG. 3) are spilled to memory (105, 107 or 117 in FIG. 1) and then filled when required. Consequently, as noted above, by specifying the number of register windows 211, 303 or 305 the actual number of physical registers is also specified.
Register windows, register window structures and manipulation of register windows, such as register window system 300, are well known to those of skill in the art. Therefore, a more detailed discussion of register windows, register window structures, and manipulation of register windows is omitted here to avoid detracting from the invention. For a more detailed discussion of one particular embodiment of a register window architecture, register window structures, and manipulation of register windows, the reader is again directed to “The SPARC Architecture Manual Version 9” edited by David Weaver and Tom Germond, 1994, published by PTR Prentice Hall, ISBN 0-13-825001-4, incorporated herein, in its entirety, by reference and available at website:
http://www.sparc.com/standards/SPARCV9.pdf.
Chapters 5 and 6 are particularly relevant.
As noted above, with the adoption of more aggressive compiler optimization techniques, coupled with increasing instruction latencies as microprocessors move to higher clock speeds, the number of registers 201 to 232 required to hold all of the pertinent information associated with a computation is increasing. Of course, in a register window architecture, such as register window system 300, this means that when all of the available registers 201 to 232 of the one register window 211, 303 or 305 associated with a given application are utilized, it is necessary to spill data to memory. This involves storing the excess information to memory, such as memory 105, 107 or 117 in FIG. 1, and then filling it whenever it is required.
The storing of excess information to memory, such as memory 105, 107 or 117 in FIG. 1, and then filling it whenever it is required, is highly undesirable and results in a large detrimental impact on performance for at least three reasons. First, the process significantly increases the number of instructions required for a given computation due to spill/fill handling. Second, the process of filling and spilling registers squanders resources. Third, storing of excess information to memory, such as memory 105, 107 or 117 in FIG. 1, and then filling it whenever it is required potentially introduces costly dependency problems and RAW (read-after-write) stalls. Consequently, in the prior art, a significant number of important applications suffered noticeably from being restricted to the number of registers 201 to 232 in a single register window 211, 303, or 305.
One seemingly simple solution to the problem discussed above would be to increase the size of the register windows 211, 303 and 305, i.e., increase the number of registers 201 to 232 allotted each register window 211, 303 and 305. However, any register based architecture with fixed size operand fields in the instruction definition cannot address more than the predetermined number of registers 201 to 232 in a register window 211, 303, or 305 at once. Consequently, significant alterations would be required to the “Instruction Set Architecture” (ISA) in order to expand the size of the register windows 211, 303 and 305. As those of skill in the art will readily recognize, this is not a viable option and therefore this seemingly simple approach is impractical.
The fact discussed above that, in the prior art, memory spills are frequently required and a significant number of important applications suffer noticeably from being restricted to the use of registers 201 to 232 in a single register window 211, 303, or 305 is particularly wasteful and frustrating since there are physical registers available in the register windows 211, 303 or 305 other than the one register window 211, 303 or 305 that the application is currently using. Unfortunately, prior art methods and structures could not enable an application to access multiple register windows 211, 303 and 305 in the same function. This was because even though a single function could theoretically move between register windows 211, 303 and 305 by manipulating PAWP 401 and by using the SAVE and RESTORE instructions, the typical RESTORE instruction specifies that any register windows 211, 303 or 305 greater than the register window 211, 303 or 305 indicated by PAWP 401 be no longer considered by the micro-processor/operating system to be valid after issuing the RESTORE instruction. Consequently, in the prior art, once a register window 301, 303 and 305 is vacated using RESTORE instruction, a function is not guaranteed that any data left in the register window 301, 303 and 305 will still be present the next time the register window 301, 303 and 305 is accessed after a SAVE. This behavior is typically observed when a RESTORE instruction is used with a return from a function and the function's processing has been completed. In the prior art, this behavior essentially prevented the use of multiple register windows 301, 303, 305 to hold data pertinent for a single function.
In short, the result in the prior art was that a function was restricted to using only the number of registers 201 to 232 in a single register window 301, 303 or 305 and any data that could not be contained in the current register window had to be spilled to memory, such as memory 107 or 117 in FIG. 1. Frustratingly, this was the case despite the fact that extra physical registers were typically readily available in other register windows 301, 303 or 305.
What is needed is a method for allowing the use of multiple register windows 301, 303, 305 to hold data pertinent to a single function.