1. Field of the Invention
The present invention relates to designs for processors in computer systems. More specifically, the present invention relates to a store queue architecture for a processor that supports speculative execution.
2. Related Art
Advances in semiconductor fabrication technology have given rise to dramatic increases in microprocessor clock speeds. This increase in microprocessor clock speeds has not been matched by a corresponding increase in memory access speeds. Hence, the disparity between microprocessor clock speeds and memory access speeds continues to grow, and is beginning to create significant performance problems. Execution profiles for fast microprocessor systems show that a large fraction of execution time is spent not within the microprocessor core, but within memory structures outside of the microprocessor core. This means that microprocessor systems spend a large amount of time waiting for memory references to complete instead of performing computational operations.
Efficient caching schemes can help reduce the number of memory accesses that are performed. However, when a memory reference, such as a load, generates a cache miss, the subsequent access to level-two (L2) cache or memory can require dozens or hundreds of clock cycles to complete, during which time the processor is typically idle, performing no useful work.
In contrast, cache misses during stores typically do not have as much of an impact on performance because the processor usually places the stores into a “store queue” and continues executing subsequent instructions. Conventional store queue designs for in-order processors typically maintain an array of stores in program order, and provide circuitry to match every incoming load against the array of stores.
Unfortunately, designers have encountered limitations while using conventional store queue designs because each store is typically buffered to an individual store queue entry. In order to allow accesses to the individual store queue entries, the store queues typically provide circuitry to produce the most recent value of every byte in the store queue. This store queue circuitry must be able to determine the most recently buffered store to a given memory location from multiple buffered stores which can be directed to the same memory location.
In addition, because each store is buffered to an individual store queue entry, the memory system (i.e., the caches/memory and the supporting control circuitry) must provide sufficient bandwidth to eventually retire each store from the store queue.
Speculative execution is similarly limited by conventional program-order store queue designs. For example, typical program-order store queue designs do not support the buffering of loads and stores with unknown values or loads with unknown addresses (loads and stores of this nature are commonplace during speculative execution). In addition, conventional program-order store queue designs do not support the out-of-order re-execution of deferred loads or stores.
Hence, what is needed is a store queue design which does not suffer from the above-described problems.