I. Field of the Disclosure
The technology of the disclosure relates generally to unordered store queues in block-based computer processors.
II. Background
Modern out-of-order (OOO) computer processors, which support processing of computer program instructions in an order other than a program order of the computer program instructions, provide a structure referred to as a store queue. The store queue stores information regarding store operations (e.g., their associated memory addresses and data) to allow correct memory ordering to be maintained in the block-based computer processor. For example, store instructions may be dispatched out of program order, even though they affect the same memory address. In this scenario, the store queue enables the block-based computer processor to resolve the order in which the store instructions should be processed in order to maintain data coherency and consistency. In some OOO processors, the same queue may be used to store and process both load and store operations, and thus may be referred to as a load/store queue (LSQ).
In a conventional store queue (implemented as, e.g., a circular buffer), the physical order of store queue entries in the store queue represents the relative order in which the store instructions associated with the store queue entries are decoded. In some circumstances, however, it may be desirable to employ an “unordered” store queue, which allows entries for store instructions to be allocated out-of-order (e.g., at execution of each instruction rather than at decoding) into any available store queue entry within the store queue. This may be advantageous in some situations by reducing the time that a store queue entry spends in the store queue, and by allowing the store queue to be banked based on address.
However, an unordered store queue may pose challenges in “draining” committed store queue entries (i.e., outputting the contents of the committed store queue entries to a memory or cache and de-allocating the committed store queue entries, after the associated store instructions have been committed). In particular, a block-based computer processor may permit a large number of store instructions within a single instruction block to be committed en masse. In situations where multiple store instructions write to the same memory address, the store instructions must be presented to the memory system in order, so that other threads do not observe out-of-order writes to the memory address. Iterating through each store instruction in the instruction block to commit and drain the store instructions in order would reduce the ability of the block-based computer processor to commit and drain multiple instructions in parallel. Thus, it is desirable to provide a high-performance mechanism for committing and draining blocks of store instructions that write to the same memory address, while maintaining coherency and consistency, in an unordered store queue.