With increasing telecommunications line rates, it is necessary to use increasingly wide hardware data buses in order to maintain throughput. For example, in FPGA implementations, a 512-bit data bus is typically used for 100 Gb/s packet processing, and a 2048-bit data bus for 400 Gb/s packet processing. One consequence is that it is increasingly likely that more than one packet can be contained in a set of bits traversing the data bus in parallel. As used herein, each set of bits transmitted over the full width of the data bus in parallel is referred to as a word.
As one example, given a minimum packet size of 64 bytes, some packets may not be entirely contained in a 512-bit word. A first data packet may begin in a previous word and end in the current word, and a second data packet may begin in the current word and end in a subsequent word. As another example, a single 2048-bit word may include the ending portion of one packet, three complete packets, and the beginning portion of another packet. To maintain a desired level of throughput, parallel hardware may be needed to deal with the multiple packets in a single cycle. However, parallel hardware is expensive in terms of required logic resources and power consumption.
Packet processing hardware is often organized as a pipeline. Simple solutions employ multiple identical instances of packet processing hardware. If a maximum of K packets may be presented at once, then the packet processing hardware for extracting header information and data is replicated K times.
Some previous solutions implement a plurality of pipelines, each configured to receive and extract data from any offset of a word received on the data bus. For example, in one parallelization approach, the entire data path is fanned-out into K independent hardware pipelines. Another approach employs a single pipeline, with K parallel units at each stage. Although all data is still potentially made available to all units, there is just a single data path and the parallel units can be selective in tapping into the data path. In either approach, each pipeline is configured to extract header and data fields of a packet from any offset of the received word. Such solutions provide flexibility to allow any one of the parallel circuits to be scheduled to process any one of a received set of packets. However, these solutions may be expensive in terms of hardware requirements.
Data and/or header fields of packets are separated from a received word through a process referred to as extraction. Extraction involves shifting the relevant field of data out of the received word. If a packet field can begin at any offset within a received word, a generic shifter capable of shifting through the entire received word is required. The above approaches require a large amount of circuitry for extraction of header and data fields of the packets because each pipeline must be capable of extracting relevant bits from any offset in the entire word. These approaches are also expensive in terms of routing resources and power consumption since much redundant data is being sent to the parallel pipelines.
One or more embodiments may address one or more of the above issues.