Software operating on multi-processor computing systems often encounters a producer-consumer problem, where a data-producing source device is unable to guarantee an expected ordered writing of packets of data and an associated semaphore to a target buffer memory location accessed by a data-consuming computer processing unit of the system. One solution to this problem is to configure the source device to require a strict ordered writing of the data and semaphore packets to the target memory locations, but, this comes at a cost of reduced data transfer rates. Another to solution is to configure the source device to cause an interrupt and force all data writes to be visible in memory upon detecting the semaphore write, but, this also comes at the cost of reduced rates of data transfer and increased latency.