1. References
The following U.S. patents and papers provide useful background information, for which they are incorporated herein by reference in their entirety.
a) Patents
6,700,889March 2004Ben-Nun6,633,920October 2003Bass et al.6,460,120October 2002Bass et al.6,404,752June 2002Allen et al.
b) Published Patent Applications
20020165947November 2002Akerman et al.20020122386September 2002Calvignac et al.20010016899August 2001Nei
c) Other References
“7-Layer Packet Processing: A Performance Analysis”, a white paper by EZchip Technologies, July 2000
2. Introduction
Network processors are commonly used in various nodes throughout a network. These processors process packets and determine the type of handling each packet may require. A commonly used network model is a seven-layer communication model. Such a seven-layer model starts with the physical layer (otherwise known as layer one (L1)) and ends with the application layer (otherwise known as layer seven (L7)) and includes all the layers in-between.
In some conventional applications, handling of packets occurred at relatively lower layers of the seven-layer communication model. On the other hand, a more modern approach attempts to handle packets at higher levels of the communication model, including up to the upper layer. This allows for more efficient and effective handling of the packet as it moves through the network. It enables a network to accelerate the transfer of packets belonging to one application containing time-critical data, for example a video conference call, preferentially over the transfer of packets containing non-critical data. In essence, the more sophisticated the capabilities of the network processor and its related firmware, the more efficient and effective is the handling of the packets and routing thereof.
The ability of the network processors to handle packets in a sophisticated manner allows for an effective increase of the network bandwidth. In such a case, utilization of the network an dramatically increase. This is quite important as there is an ever increasing demand for additional bandwidth over the network and avoidance of network congestion. To enable this, packets are classified prior to the processing by the network processor to identify the packet as belonging to a specific process flow. An example of such a classifier is discussed in U.S. patent application Ser. No. 09/541,598 titled “An Apparatus for Wire-Speed Classification and Pre-Processing of Data Packets in a Full Duplex Network” by Ben-Nun et al., assigned to common assignee, and which is hereby incorporated by reference for all that it contains.
It has been noted that, after classification, packets belonging to different process flows may be executed on multiple network processors to increase system performance. In several network applications in related art, network processors have multiple pipelines within a network processor to provide further acceleration within the network processor itself.
While such increased parallel processing of packets by the network processor is of significant importance, there is a limitation that results from the actual design of packet based processing. Specifically, the chronological order among packets arriving at a destination needs to be maintained. However, when massively parallel network processors as well as sub-systems within the network processor attempt to transmit packets at high speed, there is a risk of a later packet moving ahead of an earlier packet. This results in an error message from the destination node. A request to re-transmit packets is often generated, causing an effective reduction of the network bandwidth. Moreover, as network processors become more sophisticated it is possible that a packet will be required to move through multiple independent pipelines of the network processor with or without a predetermined order. Furthermore, the deeper the pipeline of a network processor, the more likely it is that additional latency will be experienced until the processing of a packet is complete, and therefore the tendency to use shallow pipelines, if at all.
Concept of latency is explained further herein using a case of a pipeline having four stages. A packet to be processed moves from one stage to the other, for example on every clock cycle. If a packet belonging to the same process flow cannot enter the pipeline before the first packet completed processing, it will take four clock cycles before the second one can enter. If there are only two stages then that would be two cycles, and for eight stages it would be 8 clock cycles. To perform processing at very high speeds, it is advantageous to provide very deep pipeline. This is because, besides the initialization and the end of the processing, the processing is very fast, assuming there are no contentions. The latency is the time it takes for a packet to complete the motion through the pipeline. In the case where the pipeline has four stages, latency is four cycles. It is compared with the throughput of the pipeline which is one cycle. The idea is to always reduce the throughput to one cycle because this is the fastest. Depending on the task at hand it may be necessary to adjust the depth of the pipeline to achieve that goal.
It would be advantageous to provide techniques to overcome the above-noted problems in the network transmission of packets.