A computer system generally consists of several basic components, including one or more microprocessors (processors), volatile and non-volatile memory, data transfer buses, interface devices, etc. Processors are generally classified as either RISC (reduced instruction set computer) or CISC (complex instruction set computer) and may be categorized according to internal architecture as well, such as, for example, scalar, super-scalar, pipelined, etc.
A processor includes many different internal components, such as, for example, bus interface units, instruction fetch and decode units, arithmetic logic units, floating point execution units, instruction and data caches, register files, etc. Register files typically include registers, or fixed-sized memory storage locations, that are accessed through a number of ports. These registers may contain integer or floating point numbers, and may range in size from the processor's word size (e.g., 32-bits, 64-bits, etc.), to double-word size (e.g., 64-bits, 128 bits, etc.), quadword size (e.g., 128 bits, 256 bits, etc.), floating point size (32 bits, 64 bits, etc.), etc.
Generally, the processor may execute an operating system, or task scheduler, as well as one or more application programs. The operating system, task scheduler, and/or application program are usually written in a high-level language, such as, for example, C/C++, etc., and reduced to processor-executable language through a compilation and linking process. During this process, program variables may be assigned to specific locations in memory, or to relative locations in a memory map which may be resolved dynamically during program execution. However, program execution speed suffers, sometimes considerably, when program variables are accessed from memory over a memory bus.
For example, in a typical read/modify/write program instruction sequence, a variable assigned to a specific location in memory is first transferred from memory, over the memory bus, to the processor. The variable is modified and then transferred from the processor, over the memory bus, to memory. Each of these memory bus transfers requires several bus cycles, which typically represent many processor clock cycles. Thus, memory transfers generally reduce program execution speed by introducing many processor wait states, or idle cycles, during which time the processor is inefficiently waiting for the memory transfer to complete.
A secondary cache, located between the processor and the memory bus, may improve program execution speed by temporarily storing the contents of certain memory locations (and, consequently, certain variables) in a limited-size, local memory. Similarly, a smaller, on-chip processor cache may improve program execution speed even more. However, variables stored in registers within the processor's register file may be accessed more quickly than values in memory, secondary caches, or even on-chip caches. Consequently, the compilation process may assign certain program variables to registers within a register file, rather than locations in memory, in order to improve program execution speed. The selection, or allocation, of candidate program variables depends upon several constraints, and is performed on very small sections of the entire program instruction sequence. Consequently, only a limited degree of program optimization is realized.