The evolution of computer systems has resulted in the development of various communication protocols to establish communication between a host controller, such as a personal computer, and peripheral devices that can communicate with the host controller. For example, serial and parallel ports, SCSI, Universal Serial Bus (USB), Peripheral Component Interconnect (PCI/PCI-X), PCI-express (PCIe), PMBus, EIDE, SATA, IEEE 1394, and 12C are typical communication protocols between host controllers and peripheral devices. As communication protocols grow in bandwidth, the physical limitations of high bit rate signaling over long distances necessitate sophisticated encoding and bundling of multiple independent signaling lanes to achieve the high design bandwidths.
For example, the first-generation PCI Express (PCIe) bus specification provides a 2.5 GT/s (giga-transfers per second) rate and an “8b/10b” encoding. Accordingly, the symbol size is ten bits long and the transfer of eight bits on a single lane incurs a latency of 4 ns (10/2.5 G/s), neglecting the 24 bytes of packet header overhead. Without error correction, there is minimal overhead to make these bits available to a computation engine on the peripheral device, which includes deserialization and latching of the data.
The third-generation PCI Express (PCIe) bus specification provides a 8 GT/s rate. Accordingly, compared to the first-generation PCIe, it would have been expected that the latency would drop to 31% (2.5/8) based solely on the higher bit rate. However, the third-generation PCIe specification provides an “128b/130b” encoding to better utilize the physical bandwidth of the communication medium. For example, an “128b/130b” encoding wastes 1.5% (2/130) of bandwidth, while an “8b/10b” encoding wastes 20% (2/10). However, because of the increased symbol size (130 bits), a minimal chunk (130 bits) of communication incurs a latency of 16.25 ns (130/8 G/s), which is more than four times longer compared to the minimal chunk of communication in the first-generation PCIe (10 bits). Moreover, to use 4 or 8 independent lanes in parallel, the receiver should time-align the independent lanes and rearrange the bits to present to the device, which in a typical FPGA implementation can take on the order of 400 ns. Therefore, communicating a few bits of information across a high bandwidth interface can be significantly costly in terms of latency and throughput.
This can present a problem for storage protocols because, for small data packet operations, for example, 8 bit data packet operations, a large fraction of all communication across the bus can be the exchange of a few bits at a time to synchronize protocol queues between the host and the device.
Accordingly, flexible and dynamic high speed bus architectures are desirable that can be optimized to enable short communication latency for small data transfers, as well as, high-bandwidth communication for bulk data transfers.