As processors run at faster speeds, memory latency on accesses to memory looms as a large problem. Commercially available microprocessors have addressed this problem by decoupling memory access from manipulation of the data used in that memory reference. For instance, it is common to decouple memory references from execution based on those references and to decouple address computation of a memory reference from the memory reference itself. In addition, Scalar processors already decouple their write addresses and data internally. Write addresses are held in a “write buffer” until the data is ready, and in the mean time, read requests are checked against the saved write addresses to ensure ordering.
With the increasing pervasiveness of multiprocessor systems, it would be beneficial to extend the decoupling of write addresses and write data across more than one processor, or across more than one functional unit within a processor. What is needed is a system and method of synchronizing separate write requests and write data across multiple processors or multiple functional units within a microprocessor which maintains memory ordering without collapsing the decoupling of the write address and the write data.