Peripheral Component Interconnect Express, PCIe 2.0 specifies 5.0 Gigbit/s symbol rate per lane. Multiple lanes can be used to fabricate larger port bandwidths. For example, x4 port would have an aggregate symbol rate of 20 G, and a bit rate of 16 G, 8b10b coding is used. A x8 port would have an aggregate symbol rate of 40 G, and a bit rate of 32 G. There are other serial interconnect protocols, for example serial rapid IO and Ethernet that have similar properties. This disclosure will focus on PICe, but is not limited to that protocol.
Referring to FIG. 1, there is illustrated a PCIe packet. The diagram is copied from PCIe specification. The PCIe packet 10 includes a framing byte 12, a two-byte sequence number 14, a header 16, data 18, a 4-byte ECRC 20, a 4-byte LCRC 22 and a final framing byte 24, all of which form a physical layer 26. The two-byte sequence number 14, the header 16, data 18, the 4-byte ECRC 20, and the 4-byte LCRC 22 form a data link layer 27. The header 16, data 18 and the 4-byte ECRC 20 form a transaction layer 28. The data 18 and the 4-byte ECRC 20 are optional, hence are shown in dashed line.
The numbers of bytes (actually a 10 bit symbol on the serial link) is shown in FIG. 1. The framing bytes, start 12 and stop 24 can be discarded by the internal logic as they are only useful for synchronizing the link to the symbol time at the receiver. The sequence number 14 only exists on the link. This is only useful to the data link layer 27, to assure that all packets are received, and in order. Although the LCRC 22 (link CRC) is valid for the link, it can be useful to monitor data integrity through a switch, or other such device.
The simplest way to convert this serial packet to a parallel bus for on chip processing is shown in FIG. 2. The 10-bit symbols at 5 G/s are converted to 8-bit data at 500 Mbits/s by SERDES (serialize/de-serialize) 30. Note that the start of packet (SOP) must always occur on lane 0. The parallel data is written 32 into a data buffer, running at the same clock rate as the 500M byte rate. It may be feasible to implement the MAC at a clock rate of 500 MHz in 90 nm The read side of the buffer, connecting to a large internal switch fabric (ISF), will not be feasible to implement at 500 MHz clock rate. Two minimum size packets are shown 32 to consume six clock ticks, and only take four ticks to write into the data buffer 36, 38.
It is possible to have a serialize/de-serialize (SERDES) 30 that creates 16-bit wide data lanes running at half the speed. The issue then is that two packets may exist at the same time on the same clock tick. Memory management would required that different packets occupy different memory locations.
When a port bifurcates, prior art methods typically instantiate another buffer for that port. This buffer is wasted when a single 1×8 port is used.