1. Field of Invention
This invention relates generally to microprocessors and specifically to register stack spill and fill operations.
2. Description of Related Art
FIG. 1 shows a typical computer system 100 including a central processing unit (CPU) 102 coupled to a primary memory 104 by a bus 112. CPU 102 is shown to include execution units 105, a register file 106, and a memory controller 110. Execution units 105, which include well-known components such as arithmetic logic units (ALU), process data during execution of a computer program residing in primary memory 104. Memory controller 110 is well-known and controls access to primary memory 104 via bus 112. Primary memory 104 is typically a volatile memory such as DRAM.
Register file 106 includes a plurality of architectural registers that have been designated for holding data associated with the execution of the program""s instructions. Specifically, when data residing in primary memory 104 is needed for processing in execution units 105, a load instruction is issued and causes the data to be loaded from primary memory 104 into register file 106. When loaded into register file 106, the requested data is available to execution units 105 for processing. Data processed by execution units 105 may be updated and held in register file 106 for subsequent use.
The number of architectural registers in register file 106 is limited in order to minimize cost and CPU size. As a result, the storage capacity of register file 106 may be exceeded during program execution. When this condition occurs, and it is desired to retain the register data for later use, the register data held in register file 106 are saved to primary memory 104 during a well-known register spill operation, thereby freeing register file resources for new data. When the data spilled from register file 106 is later needed by execution units 105, the data is restored from primary memory 104 to register file 106 during a well-known register fill operation.
Each spill operation that stores register data to primary memory 104 requires access to primary memory 104, and therefore incurs delays associated with arbitrating access to bus 112 and with writing data to primary memory 104. Similarly, each fill operation that retrieves previously spilled data from primary memory 104 into register file 106 incurs delays associated with arbitrating access to bus 112 and with reading data from primary memory 104. The primary memory latencies associated with register spill and fill operations undesirably degrade system performance.
Modern computer systems typically include a cache memory implemented between the CPU and primary memory in order to increase performance. FIG. 2 shows CPU 102 including a cache memory 108 coupled to register file 106 and memory controller 110. Cache memory 108 is a small, fast memory device (such as, for example, an SPAM device) that stores data most recently used by CPU 102 during execution of the computer program. If data requested by an instruction resides in cache memory 108 (a cache hit), the data is provided to register file 106 from cache memory 108 rather than from the much slower primary memory 104. Conversely, if the requested data is not in cache memory 108 (a cache miss), the data is loaded into register file 106 and to cache memory 108 from primary memory 104.
In order to minimize primary memory latencies, data stored in a line of cache memory 108 is usually not written back to primary memory 104 until the cache line is selected for replacement with new data. If data in the cache line selected for replacement has been modified (e.g., dirty data), the data is written back to primary memory 104 in a well-known writeback operation. Otherwise, if the data is unmodified (e.g., clean data), the cache line is replaced without writeback to primary memory 104.
Data spilled from register file 106 is typically routed to primary memory 104 through cache memory 108. If the spilled data has not yet been written back to primary memory 104, but rather still resides in cache memory 108 (a cache hit), a subsequent fill operation may restore the spilled data from cache memory 108 to register file 106 without accessing primary memory 104. However, because data spilled from register file 106 is randomly mapped into cache memory 108 and is subject to the same cache replacement strategies as other data residing in cache memory 108, spilled register data residing in cache memory 108 may be selected for replacement and written back to primary memory 104 at any time. When the spilled data no longer resides in cache memory 108, a cache miss occurs, and the spilled data must be retrieved from primary memory 104, which undesirably incurs primary memory latencies.
A method and apparatus are disclosed that reduces primary memory latencies for register spill and fill operations. In accordance with the present invention, a central processing unit includes a primary cache memory and a stack cache memory coupled to a register file having a plurality of architectural registers. The primary cache is a conventional cache memory that stores data most recently used by the CPU so that register load operations may be serviced by the primary cache rather than by the primary memory. The stack cache includes a plurality of cache lines, each of which implements a last-in, first out (LIFO) queue for stacking data spilled from the register file. In one embodiment, each architectural register is mapped to a unique stack (e.g., cache line) of the stack cache. In other embodiments, each architectural register may be mapped to multiple unique stacks of the stack cache.
During a register spill operation, data is spilled from an architectural register and stored on top of its dedicated stack implemented in the stack cache. In one embodiment, the top of each stack is indicated using a top-of-stack pointer. The register data stored in the stack cache is maintained in the stack cache. Specifically, the stack cache operates independently of the primary cache, and thus register data stored in the stack cache is not written to the primary memory during writeback operations associated with the primary cache.
During a register fill operation, register data previously spilled from the register stack is popped from the top of the stack and restored into its corresponding architectural register. In this manner, data spilled from the register file may be stacked in the stack cache and later restored to the register file without incurring primary memory latencies.