1. Field of the Invention
This invention relates generally to processor-based systems, and, more particularly, to an out-of-order load/store queue structure that may be implemented in processor-based systems.
2. Description of the Related Art
Processor-based systems utilize two basic memory access instructions or operations: a store that puts (or stores) information in a memory location such as a register and a load that reads information out of a memory location. High-performance out-of-order execution microprocessors can execute memory access instructions (loads and stores) out of program order. For example, a program code may include a series of memory access instructions including loads (L1, L2, . . . ) and stores (S1, S2; . . . ) that are to be executed in the order: S1, L1, S2, L2, . . . . However, the out-of-order processor may select the instructions in a different order such as L1, L2, S1, S2, . . . . Some instruction set architectures require strong ordering of memory operations (e.g. the x86 instruction set architecture). Generally, memory operations are strongly ordered if they appear to have occurred in the program order specified. When attempting to execute instructions out of order, the processor must respect true dependencies between instructions because executing loads and stores out of order can produce incorrect results if a dependent load/store pair was executed out of order. For example, if S1 stores data to the same physical address that L1 subsequently reads data from, the store S1 must be completed (or retired) before L1 is performed so that the correct data is stored at the physical address for the L1 to read.
Dependencies between instructions can also be violated when different instructions are performed by different processors and/or co-processors in systems that implement multiple processors and/or co-processors. For example, if a first processor performs a store to address A1 followed by a store to address A2 and a second processor performs a load from address A2 (which misses in the data cache of the second processor) followed by a load from address A1 (which hits in the data cache of the second processor), strong memory ordering rules may be violated. Strong memory ordering rules require, in the above example, that if the load from address A2 receives the store data from the store to address A2, then the load from address A1 must receive the store data from the store to address Al. However, if the load from address A1 is allowed to complete while the load from address A2 is being serviced, then the following scenario may occur: (1) the load from address A1 may receive data prior to the store to address A1; (2) the store to address A1 may complete; (3) the store to address A2 may complete; and (4) the load to address A2 may complete and receive the data provided by the store to address A2. This outcome would be incorrect because the load from address A1 occurred before the store to address A1. In other words, the load to address A1 will receive stale data.
Store and load instructions typically operate on memory locations in one or more caches associated with the processor. Values from store instructions are not committed to the memory system (e.g., the caches) immediately after execution of the store instruction. Instead, the store instructions, including the memory address and store data, are buffered in a store queue for a selected time interval. Buffering allows the stores to be written in correct program order even though they may have been executed in a different order. At the end of the waiting time, the store retires and the buffered data is written to the memory system. Buffering stores until retirement and completion of the write operation can avoid dependencies that cause an earlier load to receive an incorrect value from the memory system because a later store was allowed to execute before the earlier load. Load instructions, including the memory address and loaded data, can also be buffered in a load queue until the load instruction has completed.
Providing one queue for buffering stores and another queue for buffering loads may introduce a number of complications and inefficiencies. For example, store instructions are added to the store queue when they have been dispatched and then remain in the store queue until they complete (i.e., receive a valid address translation and data) and retire (i.e., write valid data back to the indicated address). However, processor-based systems typically implement a “lazy write” approach that allows the store instruction to be retired before the data is actually written into memory. The store instruction therefore thinks it is done even though it has not yet retired. Lazy writing provides the system with flexibility that can be used to improve performance in some cases. However, delaying writes for store instructions can cause the store queue to grow very large as it fills with stores, completed stores, and “retired” store entries that are waiting for their data to be written back into memory. Furthermore, other instructions in the out-of-order system may be stalled while the retired entries are waiting to have their data written back to memory. Similar problems can afflict the load queue.