With increasing telecommunications line rates, it is necessary to use increasingly wide hardware data buses in order to maintain throughput. For example, in FPGA implementations, a 512-bit data bus is typically used for 100 Gb/s packet processing, and a 2048-bit data bus for 400 Gb/s packet processing. One consequence is that it is increasingly likely that more than one packet can be contained in a set of bits traversing the data bus in parallel. As used herein, each set of bits transmitted over the full width of the data bus in parallel is referred to as a word.
As one example, given a minimum 64-bit packet size, portions of two packets may be included in a 512-bit word. A portion of a first data packet may end in the word and a portion of a second data packet may begin in the same 512-bit word. As another example, a single 2048-bit word may include portions of five data packets (one ending portion, three complete packets, and one beginning portion). As a result, to maintain throughput, parallel hardware may be needed to deal with the multiple packets in a single cycle. Parallel hardware is expensive in terms of required logic resources and power consumption.
Packet processing hardware is often organized as a pipeline to maintain throughput. Simple solutions employ multiple identical instances of packet processing hardware. If a maximum of k packets may be presented at once, then the packet processing hardware is replicated k times. In one parallelization approach, the entire data path is fanned-out into k independent hardware pipelines. Each pipeline is configured to extract data from the data path relevant to the packet it is handling. This approach is wasteful in terms of routing resources and power consumption since much redundant data is being sent to each pipeline. Another solution employs a single pipeline, with k parallel units at each stage. Although all data is still potentially made available to all units, there is just a single data path and the parallel units can be selective in tapping into the data path. The basic inefficiency with these approaches is that each of the k packet processing units is configured to handle a maximum-size data packet because each packet processing unit must be able to handle the worst case.
One or more embodiments may address one or more of the above issues.