Peripheral Component Interconnect (PCI) is a second generation parallel bus architecture developed in 1992 as a replacement for the Industry Standard Architecture (ISA) bus. In PCI, all the devices share the same bidirectional, 32-bit (or 64-bit), parallel signal path. The PCI bus brought a number of advantages over the ISA bus, including processor independence, buffered isolation, bus mastering, and true plug-and-play operation. PCI Express (PCIe) is a third generation general-purpose serial I/O interconnect designed to replace the PCI bus. Rather than being a bus, PCIe is structured around point-to-point serial links called lanes.
The point-to-point serial link architecture of PCI Express is well suited for distributed processing via a distributed multiprocessor architecture model. Distributed processors are generally optimized to implement data packet processing functions. Unlike general-purpose CPUs that rely heavily on caching for improving performance, distributed processors have a lack of locality in packet processing and need for high-performance I/O that has pushed designers to come up with innovative architectures to reduce processing latency while still processing packets at high data rates.
Currently, transaction ordering attributes in PCIe and similar interconnects must be set by the requester. Since the host CPUs are a general resource, they do not typically have the ability to set ordering attributes according to the specific requirements of the activity being performed, and must fall back to the lowest common denominator, which leads to low performance. CPU to IO reads are often the most performance critical system transactions because a CPU core may stall waiting for the result. Improving the performance of such reads therefore leads directly to an improvement of the overall system performance by freeing CPU resources for other, more useful, work.