1. Field of the Invention
This disclosure relates to microprocessors, and more particularly to techniques for supporting concurrent stores and loads in a processor.
2. Description of the Related Art
Modern out-of-order processors are often configured to execute load and store instructions out-of-order, and also permit loads to access memory in a speculative manner. Speculatively-executed loads and stores are typically held in queues until necessary criteria is met to make the loads and stores architecturally visible (i.e., visible to software). In a multi-processor environment, the order rules of memory accesses by various processors is defined by the memory consistency model specified by a given instruction set architecture (ISA). The weakly-ordered model is one such memory consistency model.
Modern microprocessors are typically coupled to one or more levels of a cache hierarchy in order to reduce the latency of the microprocessor's request for data in memory. The request may result from a read or a write operation during the execution of one or more software applications. Generally, a cache may store multiple cache lines, where a cache line holds several bytes of data in contiguous memory locations. A cache line may be treated as a unit for coherency purposes. In addition, a cache line may be a unit of allocation and deallocation in the cache. By having a unit of allocation and deallocation of several bytes in a cache, memory accesses may be more efficient and have a smaller latency than having a unit of one or a few bytes. As used herein, a “line” is a set of bytes stored in contiguous memory locations, which are treated as a unit for coherency purposes. As used herein, the terms “cache block”, “block”, “cache line”, and “line” are interchangeable.
A load operation typically takes precedence over a store operation if a conflict exists between the two operations. However, delaying store operations which conflict with load operations can degrade processor performance. A “load memory operation” or “load operation” may refer to a transfer of data from memory or cache to a processor, and a “store memory operation” or “store operation” may refer to a transfer of data from a processor to memory or cache. “Load operations” and “store operations” may be more succinctly referred to herein as “loads” and “stores”, respectively.
A load/store unit often includes a queue for buffering stores that are waiting to be written to the memory system. This queue may be dedicated to stores or alternatively, the queue may buffer both stores and loads. With loads taking precedence over stores, a large number of stores may be waiting in the queue at any given time. To accommodate a large number of stores, the size (i.e., number of entries) of the queue may be increased. Each entry in the queue often includes storage for data, address, and various read ports and cam ports. Accordingly, increasing the size of the queue can be expensive with respect to hardware requirements, timing impact, and power utilization.