The present invention is related to the field of computer communications and more particularly to improved PCI Express (PCIe) communications.
PCI (Peripheral Component Interconnect) and PCI Express (Peripheral Component Interconnect Express) are two well-known protocols used for providing a communication between personal computer systems. With multiple devices providing simultaneous data streams, a high degree of parallel processing and/or multithreaded processing is required. Under the current PCIe protocol, a mechanism of sorting to DRAM (Dynamic Random Access Memory) pages has been proposed that allows for re-ordering of some, but not all, operations to support high bandwidth adapters, including accelerators. Under this proposed mechanism of sorting DRAM, a doubling (or more) in the performance may be achieved on non-sequential memory access.
However, because PCIe posted writes do not provide a completion signal, a device or functional sub-part of a device has no means of knowing when a collections of writes has been completed. Hence, the standard mechanism for ensuring that writes are completed in the PCIe (zero-length Read) reduces the benefit of loose ordering by ordering all streams, potentially eliminating all benefit that was gained in performance.
One proposed solution of having a posted write with a completion response has been considered in the PCI SIG (Special Interest Group). However, this proposed solution has been rejected on the grounds that it can lead to deadlock or livelock under some circumstances.
Hence, there is a need for a method and apparatus for providing a loose-ordering of multiple streams in PCIe communications.