The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for speeding up the execution of younger store instructions occurring after a synchronization (sync) instruction.
In many known out-of-order processor architectures, synchronization (sync) instructions are used to enforce ordering for load instructions. The sync instruction is held as dispatched until the load miss queue (LMQ) of the load/store (L/S) unit of the processor is empty, i.e. all previous loads occurring prior to the sync instruction have completed. Once all of the previous loads are completed, the sync instruction is executed. The sync instruction and all younger store instructions, i.e. store instructions dispatched after the dispatching of the sync instruction, are placed in the store reorder queue (SRQ) and the sync instruction waits for it to be the next instruction to complete. When the sync instruction is at the next to complete (NTC) stage, it is sent to the nest, i.e. the logic outside the processor core, e.g., cache memory, system memory, disk or other storage devices, or the like. The nest is basically comprised of slower storage devices and circuitry than the logic and circuitry provided within the processor core. Essentially, the nest is comprised of devices, logic, and circuitry to which send data is sent and from which load data is received.
The younger store instructions in the SRQ wait in the SRQ until the older sync instruction completes. When the nest responds with a sync_ack response, indicating that they have acknowledged the sync instruction, the L/S unit finishes its processing of the sync instruction. The completion logic of the processor then performs its processing to complete the sync instruction. A completion pointer is then updated to point to the store instruction, i.e. the younger store that is now next to complete. Once the store instruction is completed, the SRQ can now send the store instruction to the nest. A similar operation is performed in response to younger load instructions with regard to a load reorder queue (LRQ).
It can be seen from the above that when there is a sync instruction in the SRQ (or the LRQ), younger store instructions (or load instructions) are stalled until the sync instruction is completed. This causes the processor to not perform as optimally as possible since the younger store instructions must wait in the SRQ.