Computer systems include a number of components and elements. Often the components are coupled via a bus or interconnect. Previously, input/output (I/O) devices were coupled together through a conventional multi-drop parallel bus architecture referred to as Peripheral Component Interconnect (PCI). More recently, a new generation of an I/O bus referred to as PCI-Express (PCIe) has been used to facilitate faster interconnection between devices utilizing a serial physical-layer communication protocol.
A PCIE architecture includes a layered protocol to communicate between device. As an example, a physical layer, link layer, and transaction layer form a PCIE protocol stack. The PCIe link is built around dedicated unidirectional pairs of serial point-to-point connections referred to as a lane. A link between devices includes some number of lanes, such as one, two, sixteen, thirty-two, and so-on. The current PCIE specification, base spec 2.0, is available at http://w.w.w.pcisig.com/specifications/pciexpress/.
Conventional PCIe ordering rules have been created to enable a producer-consumer programming model. Under this model, reads from a particular device are required to push the writes that were generated ahead of it. Such a programming model ensures that the read of a memory location X will always get the most recent data that was written to the memory location.
However, this ordering requirement causes requests from one request stream (a sequence of read or write transactions that have the same requester and the same destination) to interfere with another independent request stream. This interference can cause a severe performance bottleneck. The problem is especially bad if non-posted requests are blocked behind posted requests as read requests are latency sensitive.