Computer systems have many protocols for transferring data between components. For example, a mother board of a computer system transfers data between the processor and peripheral components such as modems, memory, and disk drives. A common protocol used in computer systems is peripheral component interconnect (“PCI”), which is a data transfer technique using a parallel bus and common clock and control signals.
Another protocol is called PCI-express (“PCIe”), which is a serial data transfer technique using point-to-point gigabit serial input/output (“I/O”). PCIe has been shown to provide fast, bidirectional data transfer without the need for a common clock on a reduced number of lines. A PCIe system defines a root complex (“RC”), switches, and endpoints (“EPs”). The RC connects to the central processing unit (“CPU”) complex, which can be a single-core or a multi-core CPU. The CPU is at the root of a tree interconnect structure. Intermediate nodes in the tree structure are implemented using switches and the leaf nodes are EPs. EPs enable data to be transferred in and out of the system memory connected to the RC by interfacing to external devices, such as a charge-coupled diode (“CCD”) camera in AN IMAGE GRABBER™ card or a physical component in a network card.
PCIe applications using gigabit serial interconnects, such as those occurring in field-programmable gate arrays (“FPGAs”), incur higher latency of operation than PCIe applications using lower-payload interconnects. This is due to additional processing required to convert data available inside the FPGA into a serial stream for transmission, and the reverse process of converting a received serial stream into data that can be processed inside the FPGA. Switches added between the RC and EPs can add additional latency on the path between the EP and the RC (e.g., system memory) when additional components are connected to the FPGA PCIe system.
Higher latency between RC and EPs degrades performance of applications, especially those utilizing direct-memory access (“DMA”). This is because conventional DMA techniques depend heavily upon programmed input/output (“PIO”) READ and WRITE operations to manage DMA operations. Another side effect of higher PIO latency of operation is an increase of CPU utilization, since the CPU has to wait longer for a response to PIO READ operations.
Reducing latency, increasing data transfer rate, and reducing CPU utilization in a PCIe system during DMA operations is desirable.