1. Field of the Invention
This invention is related to the field of microprocessors and, more particularly, to caching structures within microprocessors in which stack data is stored within a stack structure.
2. Description of the Relevant Art
Superscalar microprocessors achieve high performance by simultaneously executing multiple instructions during a clock cycle and by specifying the shortest possible clock cycle consistent with the design. As used herein, the term "clock cycle" refers to an interval of time during which the pipeline stages of a microprocessor perform their intended functions. The resulting values are moved to the next pipeline stage in response to a clock signal defining the clock cycle.
Since superscalar microprocessors execute multiple instructions per clock cycle and the clock cycle is short, a high bandwidth memory system (i.e. a memory system that can provide a large number of bytes in a short period of time) is required to provide instructions and data to the superscalar microprocessor. Without a high bandwidth memory system, the microprocessor would spend a large number of clock cycles waiting for instructions or data to be provided, then would execute the received instructions and/or the instructions dependent upon the received data in a relatively small number of clock cycles. Overall performance would be degraded by the large number of idle clock cycles. However, superscalar microprocessors are ordinarily configured into computer systems with a large main memory comprised of dynamic random access memory (DRAM) cells. DRAM cells are characterized by access times which are significantly longer than the clock cycle of modern superscalar microprocessors. Also, DRAM cells typically provide a relatively narrow output bus to convey the stored bytes to the superscalar microprocessor. Therefore, DRAM cells provide a memory system that provides a relatively small number of bytes in a relatively long period of time, and do not form a high bandwidth memory system.
Because superscalar microprocessors are typically not configured into a computer system with a memory system having sufficient bandwidth to continuously provide instructions and data, superscalar microprocessors are often configured with caches. Caches are storage devices containing multiple blocks of storage locations, configured on the same silicon substrate as the microprocessor or coupled nearby. The blocks of storage locations are used to hold previously fetched instruction or data bytes. A block of storage stores a "line" of bytes (i.e. a number of contiguous bytes). The line is transferred to and from main memory as a unit. Bytes within a line can be transferred from the cache to the destination (a register or an instruction processing pipeline) quickly; commonly one or two clock cycles are required as opposed to a large number of clock cycles to transfer bytes from a DRAM main memory.
When a cache is searched for bytes residing at an address, a number of bits from the address are used as an "index" into the cache. The index selects a block or blocks of storage within the cache, and therefore the number of address bits required for the index is determined by the number of rows configured into the cache. The act of selecting a row via an index is referred to as "indexing". The addresses associated with bytes stored in the multiple blocks of a row are examined to determine if any of the addresses stored in the row match the requested address. If a match is found, the access is said to be a "hit", and the cache provides the associated bytes. If a match is not found, the access is said to be a "miss". When a miss is detected, the bytes are transferred from the memory system into the cache. The addresses associated with bytes stored in the cache are also stored. These stored addresses are referred to as "tags" or "tag addresses". It is noted that a cache may be configured in a set-associative or direct-mapped configuration.
A high bandwidth memory system is particularly important to a microprocessor implementing the x86 microprocessor architecture. The x86 architecture implements a relatively small register set. Consequently, many data values which a program is manipulating are stored within a stack. In particular, values passed between a calling routine and the subroutine called are often passed through the stack. As will be appreciated by those of ordinary skill in the art, a stack is a data storage structure implementing a last-in, first-out (LIFO) storage mechanism. Data is "pushed" onto a stack (i.e. the data is stored into the stack data structure) and "popped" from the stack (i.e. the data is removed from the stack data structure). When the stack is popped, the data removed is the data that was most recently pushed. The ESP register of the x86 architecture stores the address of the "top" of a stack within main memory. The top of the stack is the storage location which is storing the data that would be provided if the stack is popped.
Since data on the stack is manipulated often, it would be advantageous to provide relatively quick access to data on the stack. One use of the stack, for example, is in passing input and output parameters between subroutines of a program and the routines calling those subroutines. The parameters may be pushed onto the stack, accessed by the subroutine, and popped from the stack upon return to the calling routine. In particular, accessing stack data without having to perform an address generation may improve microprocessor performance by allowing instructions which access the stack to fetch their operands earlier.