Field
The disclosed embodiments generally relate to techniques for improving performance in computer systems. More specifically, the disclosed embodiments relate to a technique for improving the performance of register window operations in computer systems by performing lazy register fills.
Related Art
The use of register windows can greatly improve processor performance by eliminating the need to save and restore registers when a program makes function calls. It does so by providing multiple sets of registers, one set (called a window) for each function call, with adjacent sets being overlapped to enable parameter passing between functions. However, when a processor runs out of windows to allocate to a new function call, it has to spill the contents of one of the register windows to a stack in memory in order to make the register window available for the new function call. This condition is called a “register window overflow.” Similarly, when this spilled window of registers subsequently needs to be accessed by the program, the registers have to be restored from the stack. This condition is called a “register window underflow.”
In existing processors, register window overflows and underflows are typically handled via software traps (referred to as “spill traps” and “fill traps,” respectively). These traps are expensive operations because they typically generate a processor pipeline flush before entering the trap handler and another flush after exiting the trap handler. Also, during a spill trap, all of the registers in the window (e.g., 16 registers in a SPARC™ architecture) are saved to the stack in memory, even though most of them are not live across the function call. Conversely, on a fill trap, all of the registers are restored from the stack even though most of them are not subsequently used. In fact, empirical results show that, for a wide range of applications, typically only four of the 16 registers are used. Moreover, register window underflows typically affect processor performance more than register window overflows. This is because the spill trap handler uses load instructions while the fill trap handler uses store instructions, and it is much more difficult for the processor to hide the latency of load instructions than to hide the latency of store instructions.
Because of the high cost of handling register window overflows and underflows, the use of register windows can degrade processor performance if an application generates too many overflows and underflows.