1. Field
The present invention generally relates to the design of processors within computer systems. More specifically, the present invention relates to a store queue that accommodates a membar token to facilitate efficient flag synchronization.
2. Related Art
For performance reasons, modern processors typically place stores, which are to be written to memory, into a store queue. The stores are subsequently drained from the store queue to the memory system after the stores are logically retired by the processor. This improves performance because it enables the processor to perform subsequent loads or stores without having to wait until preceding stores are committed to the memory system.
Under strong memory models, such as sequential consistency or total-store-order (TSO), the system must generally wait for an acknowledgment that a store has been committed to the memory system before a subsequent store can be sent from the store queue to the memory system. This need to wait for acknowledgments can adversely affect system performance. In contrast, weaker memory models, such as partial-store-order (PSO), allow stores to be sent out without receiving such acknowledgments. This allows stores to be pipelined, which can greatly improve system performance.
Systems that use these weaker memory models typically provide instructions, such as a memory-barrier (membar) instruction, that enable the programmer to ensure that preceding stores have committed to the memory system before subsequent stores are sent to the memory system. However, in existing store queue designs, when a membar instruction is encountered, the system typically waits until all preceding stores have been drained from the store queue to memory before a new store can be placed in the store queue. This need to drain the store queue during a membar instruction can adversely affect system performance.
Hence, what is needed is a method and an apparatus that implements such a membar without the above-described performance problems.