In network processing systems, packets traversing switching network are generally analyzed by network processors that execute functions on the packets including routing, segmentation and re-assembly, filtering and virus scanning, to increase performance, security and service quality. However, due to the increasing complexity of operation types that network processors may be required to execute on packets, and the increasing rate of bandwidth and packet rate transmission increase in relation to the rate of increase of network processor processing power, it is essential for devices and methods to increase the overall processing performance of network processors accordingly.
A common method for achieving higher processing performance than a single processor or network processor can provide consists in parallel processing, where multiple processors operate in parallel. Such multiple processors may be considered as a single network processor of higher speed.
In the context of network processing, parallel processing has, in prior art, been implemented as load balancing, or channel striping. Prior art channel striping (also known as load sharing or inverse multiplexing) is frequently used in networking because of processing bottlenecks or simply because of price/performance ratio. In that scheme, a Round-Robin Algorithm or a Load Sharing Algorithm is used that stripes the packets belonging to a stream across multiple channels. A major problem with striping is that packets may be mis-ordered due to different delays on different channels and due to different packet sizes. Three types of solutions for this mis-ordering problem are known in the prior art:                i) keeping each flow on only one channel and accepting that a single flow cannot use more bandwidth than each channel can support,        ii) reordering the received packets after mis-ordering and accept the resulting waste of processing bandwidth, and        iii) splitting packets up into fixed transfer units which the network processing means can process in a predictable period of time.        
Dynamic load balancing, on the other hand, is commonly used in the field of computational parallel processing, dealing with three general computing entities: computations, tasks and data. In these cases, dynamic load balancing tries to find the mapping of computations, tasks or data, to computers that results in each computer having an approximately equal amount of work in order to reduce run time and increase the overall efficiency of a computation.
U.S. patent application Ser. No. 09/551,049 assigned to IBM Corporation and filed before the United States Patent and Trademark Office on Apr. 18, 2000, describes a real-time load-balancing system for distributing a sequence of incoming data packets emanating from a high speed communication line to a plurality of processing means, each operating at a capacity that is lower than the capacity of the high speed communication line. The system comprises parser means capable of extracting a configurable set of classifier bits from the incoming packets for feeding into compression means. The compression means are capable of reducing a bit pattern of length K to a bit pattern having a length L which is a fraction of K. This system further comprises a pipeline block for delaying incoming packets until a load balancing decision is found, and an inverse demultiplexer for receiving a port identifier output from said compression means as selector and for directing pipelined packets to the appropriate output port.
However, there is still a need for preserving the correct sequencing of flows, particularly for traffic wherein an individual flow exceeds the performance capability of a single network processor. Ordered recombination of packet flows is straightforward if the packets can be modified. An obvious method would be to label each incoming packet with a sequence number, and to only prevent output packets from exiting in non-sequential order. However, the disadvantage of packet modification is that the individual network processors must be configured differently in an aggregated configuration than in single network processor configuration, to correctly process the modified packets.
If such a need is requested by current technical performances of network processing means, it also allows reuse of previous generations of network processing means by merging their performances to reach the desired one and thus, to optimize cost of such network processing means.