Processors typically have no stacked registers inside the processor and are unable to implement a hardware-based stack frame architecture. An application calling a new procedure requires a task switch operation, in which the current stacked register information needs to be stored on the main memory stack of the calling application. Once the called procedure exits, stacked register data from the calling procedure is repopulated inside the processor from the main memory stack before execution continues. The return values of the called procedure are largely stored in memory with retrieval requiring expensive memory reads.
Intel's Itanium® processor includes 128 general integer registers. The first 32 registers, i.e., r0–r31, are static registers, which are visible to all procedures. The remaining 96 registers, i.e., r32–r128 are stacked registers, which are local to each procedure. The set of stacked registers visible to a given procedure is called a register stack frame. Intel's Itanium® processor also includes a Register Stack Engine (RSE), which is responsible for mapping a register stack frame to stacked registers in the physical register file. When a procedure is called, the stacked registers are renamed such that caller's first register in the output area becomes r32 for the callee. The input area of callee starts from the first of caller's area. Parameters passed to callee through the output area of caller's register stack frame. When callee returns, the register renaming is restored to the caller's configuration. This mechanism allows registers of caller to be preserved in register file instead of storing to memory.
If not enough stacked registers are available, RSE will overflow the oldest register stack frames to memory to make room. The overflowed register stack frames will be stored to a memory area called backing store. Once a function returns, RSE restores the register stack frame from backing store to registers. This process is automatically done by RSE and transparent to a compiler.
RSE fills stall program execution. When the total stacked registers allocated from the active procedures on call stack exceeds the allowed stacked register, a stacked register overflow occurs and the program execution is stalled to wait for the completion of RSE processing. Similarly, the RSE fill will also stall the execution. Therefore, maximizing the usage of stacked registers in each procedure may not be optimal as it may cause delays in program execution. In particular, programs with high RSE costs as a percentage of their overall execution cost will be highly affected by RSE fills.