1. Field of the Invention
This invention is related to the field of superscalar microprocessors and, more particularly, to dependency checking structures for detecting dependencies between accesses to a pair of caches employed within a superscalar microprocessor.
2. Description of the Relevant Art
Superscalar microprocessors achieve high performance by simultaneously executing multiple instructions in a clock cycle and by specifying the shortest possible clock cycle consistent with the design. As used herein, the term "clock cycle" refers to an interval of time during which the pipeline stages of a microprocessor perform their intended functions. At the end of a clock cycle, the resulting values are moved to the next pipeline stage.
Since superscalar microprocessors execute multiple instructions per clock cycle and the clock cycle is short, a high bandwidth memory system is required to provide instructions and data to the superscalar microprocessor (i.e. a memory system that can provide a large number of bytes in a short period of time). Without a high bandwidth memory system, the microprocessor would spend a large number of clock cycles waiting for instructions or data to be provided, then would execute the received instructions and/or the instructions dependent upon the received data in a relatively small number of clock cycles. Overall performance would be degraded by the large number of idle clock cycles. However, superscalar microprocessors are ordinarily configured into computer systems with a large main memory composed of dynamic random access memory (DRAM) cells. DRAM cells are characterized by access times which are significantly longer than the clock cycle of modern superscalar microprocessors. Also, DRAM cells typically provide a relatively narrow output bus to convey the stored bytes to the superscalar microprocessor. Therefore, DRAM cells provide a memory system that provides a relatively small number of bytes in a relatively long period of time, and do not form a high bandwidth memory system.
Because superscalar microprocessors are typically not configured into a computer system with a memory system having sufficient bandwidth to continuously provide instructions and data, superscalar microprocessors are often configured with caches. Caches are storage devices containing multiple blocks of storage locations, configured on the same silicon substrate as the microprocessor or coupled nearby. The blocks of storage locations are used to hold previously fetched instruction or data bytes. The bytes can be transferred from the cache to the destination (a register or an instruction processing pipeline) quickly; commonly one or two clock cycles are required as opposed to a large number of clock cycles to transfer bytes from a DRAM main memory.
Caches may be organized into an "associative" structure (also referred to as "set associative"). In an associative structure, the blocks of storage locations are accessed as a two-dimensional array having rows and columns. When a cache is searched for bytes residing at an address, a number of bits from the address are used as an "index" into the cache. The index selects a particular row within the two-dimensional array, and therefore the number of address bits required for the index is determined by the number of rows configured into the cache. The act of selecting a row via an index is referred to as "indexing". The addresses associated with bytes stored in the multiple blocks of a row are examined to determine if any of the addresses stored in the row match the requested address. If a match is found, the access is said to be a "hit", and the cache provides the associated bytes. If a match is not found, the access is said to be a "miss". When a miss is detected, the bytes are transferred from the memory system into the cache. The addresses associated with bytes stored in the cache are also stored. These stored addresses are referred to as "tags" or "tag addresses".
Several blocks of memory are configured into a row of an associative cache. Each block of memory is referred to as a "way"; ultiple ways comprise a row. The way is selected by providing a way value to the cache. The way value is determined by examining the tags for a row and finding a match between one of the tags and the requested address. A cache designed with one way per row is referred to as a "direct-mapped cache". In a direct-mapped cache, the tag must be examined to determine if an access is a hit, but the tag examination is not required to select which bytes are transferred to the outputs of the cache. Since only an index is required to select bytes from a direct-mapped cache, the direct-mapped cache is a "linear array" requiring only a single value to select a storage location within it.
A high bandwidth memory system is particularly important to a microprocessor implementing the x86 microprocessor architecture. The x86 architecture implements a relatively small register set including several registers which are not general purpose. Registers which are not general purpose may not be used to store an arbitrary value because the value they store has a specific interpretation for certain instructions. Consequently, many data values which a program is manipulating are stored within a stack. As will be appreciated by those of skill in the art, a stack is a data storage structure implementing a last-in, first-out storage mechanism. Data is "pushed" onto a stack (i.e. the data is stored into the stack data structure) and "popped" from the stack (i.e. the data is removed from the stack data structure). When the stack is popped, the data removed is the data that was most recently pushed. The ESP register of the x86 architecture stores the address of the "top" of a stack within main memory. The top of the stack is the storage location which is storing the data that would be provided if the stack is popped.
Since data on the stack is manipulated often, it would be advantageous to provide relatively quick access to data on the stack. In particular, accessing stack data as early as possible in the instruction processing pipeline may improve microprocessor performance by allowing instructions which access the stack to fetch their operands early. As used herein, the term "instruction processing pipeline" refers to a pipeline which performs instruction processing. Instruction processing includes fetching, decoding, executing, and writing the results of each instruction. An instruction processing pipeline is formed by a number of pipeline stages in which portions of instruction processing are performed. Typically, memory operands (both stack and non-stack) are accessed from the execute stage of the instruction processing pipeline. As used herein, the term "operand" refers to a value which an instruction is intended to manipulate. Operands may be memory operands (which are stored in memory) or register operands (which are stored in registers).
Certain types of addressing employed by x86 instructions indicate that an access to stack data is occurring. However, other types of addressing employed by x86 instructions do not indicate a stack access. These types of addressing may still access data on the stack, since the stack is a block of memory in the x86 architecture and memory is accessible via any type of addressing. In particular, the various addressing modes may indicate accesses to the same address. Coherency of the data stored at the address must be maintained such that a write to the address is reflected in data later read from that address. A structure which allows access to stack data prior to the execute stage of the instruction processing pipeline while still maintaining coherency between various addressing modes of instructions is desired.