PCI Express is the next generation of PCI (Peripheral Component Interconnect), which is a standard interconnection system that enables the transfer of data between a host device 112 and an attached application layer device 114 of a data transfer system 100, FIG. 1. The PCI Express protocol is implemented using PCI Express core 116. PCI Express core 116 is a hardware controller used to identify and resolve the PCI Express protocol layers: the physical/mac layer 118, the link layer 120 and the transaction layer 122. The data is delivered through an application layer interface 124 to the attached application layer device 114.
PCI Express protocol is a very fast, bandwidth rich protocol, enabling a variety of applications to be implemented through a PCI Express link. Application layer devices 114 can include bandwidth-consuming applications, such as file transfers and multimedia files, latency-sensitive applications, such as real-time video and voice streaming applications, and applications requiring both high bandwidth and low latency, such as video conferencing.
The application layer interface 124 connects the PCI Express core 116 to the application layer device 114. The application layer device 114 may be a single, common address/data bus having a few control signals to insure errorless handshakes between the host 114 and any type of application. For example, the application layer device may be a switch or router connected between the PCI Express core 116 and a number of clients that communicate with the host 112. The application layer device in such a case routes incoming packets to the appropriate client. The application layer interface 124 is driven by the transaction layer architecture 122 of the PCI Express core 116. The transaction layer architecture 122 of the PCI Express core 116 typically consists of six FIFO buffers: a non-posted header buffer “NP H” 126, a non-posted data buffer “NP D” 128, a posted header buffer “P H” 130, a posted data buffer “D” 132, a completion header buffer “C H” 134 and a completion data buffer. “C D” 136. The six buffers 126-136 are needed to implement the PCI Express reordering rules for three different types of transfers: 1) posted transfers (typically memory write transfers); 2) non-posted transfers (typically memory read transfers); and 3) completion transfers (also called “read response” transfers). The PCI Express reordering rules are set by the PCI Express Standard and described in the PCI Express Base Specification.
However, the reordering of transfers is an area where the most bandwidth is lost and latency increased during the processing of transfers. Typically, data is sent through a single output queue 138 of the host device 112 to a single input queue 140 of the receiving PCI Express core 116 over PCI Express link 142. From the single input queue 140, the data is then distributed to the six transaction layer FIFOs 126-136, depending upon the transfer type of the packet (posted, non-posted, or completion). An arbiter 137 then offloads the transfers from the FIFOs 126-136, in an order dictated by the PCI Express reordering rules and transmits the transfers through the application layer interface 124 to the application layer 114.
The reordering rules are designed to favor the transmission of completion packets before posted and non-posted packets, because completion packets carry data earlier requested by the application layer 114, through the application layer interface 144 of the application layer 114. This means that the transmission of packets to the application layer 114 acting as the master in the system are preferred to the packets where application layer 114 is a slave. This is illustrated in the following example.
As shown in FIG. 2, host device 112 transmits the packet sequence 150: C1 (completion packet 1), P1 (posted packet 1), C2 (completion packet 2), . . . , Cn (completion packet n) from the output queue 138 to the input queue 140 through the PCI Express link 142. The PCI Express link 142 delivers the packets to the input queue 140 in the same order that they were stored in the output queue 138. The input queue 140 propagates the packets into FIFO reordering buffers 126-136 at the transaction layer 122 of the PCI Express core 116. The packets are reordered according to the reordering rules specific to the PCI Express standard, and the sequence arrives at the application layer interface 124 in the order: C1, C2, . . . , Cn, P1. As shown, because of the reordering rules implemented by the arbiter 127, the posted request packet P1 is transmitted after all the completion packets C1, C2, . . . , Cn are transmitted.
If the data carried by the posted packet P1 is latency-sensitive data, which it typically is in most applications, since posted and non-posted requests are usually data write requests and data read requests to the application, the PCI Express core 116 has failed to provide the “Quality of Service,” in terms of latency, required for the posted packet P1. Even worse, if the completion packets (C1, C2, . . . , Cn) have a large data payload, which is usually the case in order to increase the efficiency of the PCI Express link 142, the posted packet P1 latency might be unacceptable for the most applications.