A frequent problem in processing digital data is to process in parallel multiple digital data streams typically having different speeds, by means of an automatic multi-channel device that provides necessary processing of each incoming stream and transferring of each incoming stream in the processed form to a respective output stream, wherein the characteristic data processing speed in each processing channel may be substantially less than the incoming stream speed, and the required non-delayed processing speed of each incoming stream is provided by presence of multiple processing channels.
An important pre-condition for successful and highly efficient operation of such devices is to precisely maintain the sequence of processed data in each output stream matched to the corresponding sequence in the incoming stream.
Data can be processed in different ways, for example, by converting input ATM protocol packets into output IP packets, converting incoming encrypted/unencrypted IP packets into decrypted/encrypted IP packets, respectively, etc.
U.S. Pat. No. 6,434,145 discloses a method for transferring data between one or more first network ports receiving one or more first data flows and one or more second network ports transmitting one or more second data flows.
The method comprises the following steps:
sending data from one or more first data flows to multiple processing channels;
processing the data in parallel by two or more processing channels;
receiving the data processed by the processing channels, and
sending the processed data to one or more second flows in one or more second ports,
wherein in at least one flow of the first and second flows, data is transferred in frames, and each frame in said one flow is processed by a single one of the processing channels, but at least two frames in said one flow are processed by two different processing channels.
Data received in each first data flow is transmitted to a respective second data flow in the same order in which the data was received in the first data flow.
Each frame received from the first flow is provided, before being sent to a processing channel for processing, with additional attributes (data) including at least:
a frame number in the first flow, and
an identification of the channel to which the frame is sent.
To ensure correct ordering of processed frames, a memory stack is organized according to the first-in-first-out (FIFO) principle in a corresponding second flow so that channel identifications for the processing channels to which the frames are sent from the first flow are stored in the stack.
Therefore, when the processed frame received from the processing channel is sent to a second flow, this is done in same order as the order of channel identifications in the FIFO stack.
It should be noted that the description of the above method uses a unique terminology according to which, in particular, the term “frame” refers to a discrete set of digital data of a particular format that corresponds to a common protocol (ATM, IP, etc.).
This method is implemented in a system comprising:
a first section for sending data from one or more first data flows to multiple processing channels, wherein, in at least one flow of the first and second flows, data is transferred in frames, wherein the first section is configured to send each frame only to one of the processing channels, and to send at least two different frames to two different processing channels;
multiple processing channels, each comprising an individual processor;
a second section for receiving data processed by the processing channels, and sending the processed data to one or more second flows into one or more second ports;
an ordering section for providing the second section with channel identifications for each processing channel to which a frame is sent by the first section,
wherein the second section is adapted to receive the channel identifications from the control ordering section in the same order in which the respective frames are arranged in at least one first flow;
wherein, when the second section receives a channel identification, the second section sends a frame from a corresponding processing channel into the second flow, so that the frames are sent into at least one second flow from the processing channels in the order defined by the channel identifications.
The method provides for processing frames of both fixed and variable size. Generally, even when a fixed size frame is processed by a predetermined algorithm, processing time of individual frames may vary due to various factors (varying operation speed of individual channels, different memory access time, etc.). Therefore, a situation may occur when a current frame of the first flow has been already processed in a processing channel, but cannot be transmitted to the system output, because the previous frame followed by the current frame has not been processed and passed to the output in the second flow yet. In this situation, the system waits until the processing ends and outputs the previous frame first and then the current frame to ensure correct ordering of the frames.
The delays may be even more significant in processing of variable size frames. Such delays impair the system performance, which is a disadvantage of the known method.
US 2002/0107903 discloses another method of providing operation of a network system for parallel processing of data streams, wherein the system comprises:
a first section adapted for                receiving incoming data streams from external network connections;        dividing the incoming data streams into portions;        providing attributes to each portion of each incoming data stream;        sending portions of each incoming data stream to processor units for processing;        
a plurality of processor units, each of the processing units including a processor and a buffer memory for storing processed portions of incoming data streams, and providing:                processing portions of incoming data streams by a predetermined algorithm;        sending the processed portions of incoming data streams to corresponding output data streams;        storing the processed portions of the incoming data streams in the buffer memory until conditions occur for sending these portions to a corresponding output        
data stream;
a second section adapted for                receiving the processed portions of the incoming data streams;        forming and modifying output queues containing output processing tokens, the number of output queues matching the number of output data streams;        transferring the processed portions of the incoming data streams in the form of the corresponding output data streams to an external network;        
wherein the first section is associated with a plurality of processor units and the second section, and the processor units are further associated with the second section.
An embodiment of the method comprises:
receiving incoming data streams from network connections in the first section;
specifying a required match between the incoming data streams and output data streams;
generating output stream queues in a second section, the number of the queues matching the number of the output data streams;
generating, in each processor unit, output queues of the processor units, the number of said queues matching the number of the output data streams;
sending portions of the incoming data streams for processing to the processor units, wherein each portion of each input data stream is provided with attributes including:                an identifier of the processor unit to which the portion of the input stream is sent;        an identifier of the incoming stream;        
placing the identifier of the processor unit, to which the next portion of the incoming data stream has been sent for processing, to the output queue of the second section that corresponds to the specified output stream and includes an output processing token;
processing the portions of the incoming data streams in the processor units to obtain respective portions of output data streams;
writing the identifier of the processor unit, in which processing of a portion of a specified input data stream has been completed, to the output queue of said processor unit that corresponds to the specified output stream;
providing a sequence of portions of the output data streams from the processor units, said sequence corresponding to the sequence of portions of the input data streams, said providing of the correct sequence including:
comparing the identifier of the processor unit, in which processing of a portion of the first stream has been completed, with a correct next identifier of the processor unit in the output processing token, and
when the compared identifiers do not match:                storing the processed portion of the first stream in the buffer memory of said processor unit;        writing the processor unit identifier into the output queue of said processor unit;        processing the next portion of the incoming data stream in the processor unit; when the compared identifiers match:        sending portions of the output data streams from the processor units to the second section for generating the output data streams in which the sequence of the portions matches the sequence of portions of the respective incoming streams, and        after sending the next processed portion of the first stream, modifying in each processor unit the identifier of said processor unit in the processor unit output queue for the respective output stream and in the output processing token of the respective output stream.        
The known method provides for processing both fixed- and variable-size portions of incoming data streams (network packets).
Here, in processing portions of an incoming data stream by a predetermined algorithm the processing time of individual portions may differ due to various factors (varying operation speed of individual processor units, different memory access time, etc.). Therefore, a situation may arise where a separate portion of the incoming stream has been already processed in some processor unit, but cannot be immediately delivered to the output of the system, since the previous portion of the incoming stream has not been processed yet.
In order to provide the sequence of portions of the output data stream precisely matching the sequence of portions of the respective incoming data stream, a specially generated queue is used the first element of which (the output processing token) is the identifier of the processor unit from which the next processed portion of the incoming data stream is to enter the output data stream.
The identifier can be an integer, a memory address, an array index, etc.
After sending a portion of the incoming data streams for processing, the identifier of the processor unit, to which the next portion of the incoming data stream has been sent for processing, is placed into the output queue of the second section stream that corresponds to the specified output stream and contains output processing token, wherein                before writing the processor unit identifier, access to the output queue is locked, thereby providing exclusive write access from said processor unit (and disabling writing by any other processor unit);        the identifier is written by performing atomic operations, and then        the access is unlocked.        
After the end of processing a portion of the incoming data stream in some processor unit, the processor unit identifier is checked for match to the correct identifier from the output processing token.
Where the compared identifiers match, the processed portion of the incoming data stream is transferred to the system output in the second section, and the output processing token is updated by removing the number of said processor unit from its queue.
Where said numbers do not match, the processed portion of the first stream is stored in the buffer memory of said processor unit, and the processor unit identifier is stored in the processor unit output queue organized in the FIFO memory stack format.
Then, the processor unit stops and continuously checks the number of said processor unit and the correct number from the output processing token until said numbers match.
According to a preferred embodiment of the method, if the numbers do not match, the processor unit receives from the first section a new portion of the incoming data stream and processes it. After the end of processing of the new portion of the incoming data stream, the identifier from the output queue stack of the processor unit is again checked for match to the correct number from the output processing token.
If said numbers match, the processed portion of the incoming data stream is transferred from the buffer memory of said processor unit to the system output in the second section, and the output processing token is updated by removing the number of said processor unit from its queue.
The output queue stack of said processor unit is also updated by removing from it the identifier of the processor unit that has transferred the processed portion of the incoming data stream to the output.
A disadvantage of the known method is that even its preferred embodiment has low efficiency due to the delay caused by checking the identifier of a particular processor unit and the correct number from the output processing token until said numbers match, since, if no match has occurred, the next check will be performed only after processing a new portion of the incoming data stream.