In serial links, each packet has an overhead associated with it. Completion transactions in PCIe have an overhead of five double words (DWs): one DW for framing, three DWs for the header, and one DW dedicated to the cyclic redundancy check (CRC). This overhead is present irrespective of the data payload the transaction contains. (PCIe is short for peripheral component interconnect express, and is a high-speed serial bus standard used by many computer systems.)
For example, if the read requests are for 32 bytes (32B), then the corresponding completions contain eight double words (DWs) (32 bytes) of data, resulting in an efficiency of 61.5% (8 DWs out of 13 DWs). Thus, in an x16 third generation PCIe link, one can achieve a data bandwidth of 9.84 gigabytes per second (GB/s) instead of the 16 GB/s available using a 32B request size. If the request size is 64B, the bandwidth efficiency increases to 76.2%. If the request size increases to 256B, the efficiency increases to 93%.
A lot of bandwidth-sensitive applications, such as graphics and high-performance computing (HPC) networking, have small request sizes. That inherently limits the amount of bandwidth achievable due to the protocol overheads discussed above. One solution is to overprovision at width and/or frequency levels, which is expensive from a cost as well as power point of view.
Thus, there is a continuing need for a solution that addresses the shortcomings of the prior art.