Microprocessors use registers to hold values of variables that are used in connection with the execution of instructions. The speed of instruction execution is, at least in part, dependent on the speed of access to data (e.g., variable values) stored in registers. Microprocessors typically have a number of physical on-chip registers, which can be accessed much more rapidly than memory. Generally, it is desirable to use the physical on-chip registers for executing instructions because such on-chip registers can be accessed more quickly, thereby decreasing instruction execution times.
In certain processors such as the Intel® Itanium® processor, the on-chip registers are divided into static registers and stacked registers. A register stack engine defines a register stack as a limited number of stacked registers (e.g., ninety six in the case of the Itanium® processor) referred to as architectural registers. The register stack engine thus maps architectural stacked registers to physical registers. The physical registers allocated in the stack may be written to and then overwritten by subsequent instructions. The register stack engine may store and load the values of stacked registers to and from memory at function entries and exits.
At a function entry to the processor, a special instruction, (e.g., “alloc”) allocates the registers on the register stack for incoming parameters, temporal or local parameters and outgoing parameters that are needed for function calls. The incoming, local and outgoing parameters are used to store variables needed to execute the function and are referred to as architectural registers used by machine instructions. A result register is used by the alloc instruction to store the previous function state register. When the function exits, the previous function state register is used to restore the original values in the stacked registers for further use. The restoration of data to registers from memory increases bus traffic and slows instruction execution.
Processors such as the Intel® Itanium® processor have a finite number of stacked registers. The Itanium® processor may allocate 96 stacked registers for immediate access at a function entry. However, this quantity of registers may be insufficient for executing complex applications with many instructions. Thus the register stack engine must save the contents of stacked registers to memory and restore the contents of such registers from memory. However, access to memory is time consuming and slows instruction execution.
In operation, processor functions execute the alloc instruction to allocate registers for a function. The register stack engine first allocates stack registers and uses memory to store stacked registers from previous instructions when the stack registers have been exhausted. In practice, many applications are complex and the stack registers are frequently exhausted resulting in many memory store and restore actions. Thus, instruction execution is slowed by the register stack engine access to memory.