1. Field of the Invention
The present invention relates generally to data processing and, more particularly, to scalable packet processing systems and methods.
2. Description of Related Art
Programmable processors can be used to create highly flexible packet processing applications. Often the performance requirements of these applications exceed the capacity of a single programmable processor. Using multiple programmable processors to achieve higher performance is challenging, however. Various problems must be solved to achieve an architecture that is both highly flexible and high performing. These problems include physical connection of multiple processors to a single stream of packets; classifying, managing, balancing, and distributing the flow of packet processing work through the available processing resources; maintaining packet ordering when packets fan out and flow through different resources; and ensuring that single processing elements or engines within processors have enough program space to run the entire processing application.
There are many types of packet processing architectures. One type of processing architecture attempts to achieve high performance by creating a pipeline of processing stages. FIG. 1 is a diagram of a pipelined packet processing architecture. The pipelined architecture includes multiple processing stages 110-1 through 110-4 (collectively referred to as processing stages 110), connected in series, that act together to perform a packet processing application. Each of stages 110 performs part of the application.
When a packet arrives at a processing stage, such as processing stage 110-2, processing stage 110-2 performs a portion of the application to generate intermediate results. Processing stage 110-2 then outputs the packet and the intermediate results to the next stage (i.e., processing stage 110-3) where processing continues. Because intermediate results are transmitted in addition to the packet, the bandwidth required into and out of processing stages 110 must be greater than the bandwidth of the packet itself.
With such an architecture, high performance can be achieved by adding additional processing stages 110. When this happens, however, the application functions must be redistributed over processing stages 110. It is important to balance the application functions performed by each of processing stages 110. If one stage is given much more work than the other stages, then that stage may become overloaded while other stages have unused capacity. In this case, the overloaded stage may become a bottleneck that limits the performance of the entire pipeline.
The distribution of application functions across a pipeline is a very difficult task. It becomes even more difficult as the number of processing stages 110 increases. Also, if the application needs to be changed to add new features, then the entire application may need to be redistributed across the pipeline. As a result, the pipeline architecture is not flexible.
Another type of processing architecture attempts to achieve high performance by connecting packet processors in parallel. FIG. 2 is a diagram of a parallel packet processing architecture. The parallel packet processing architecture includes packet processors 210-1 through 210-4 (collectively referred to as processors 210) connected between a sprayer 220 and a desprayer 230. Unlike the pipelined packet processing architecture, each of processors 210 in the parallel processing architecture includes the entire packet processing application. In other words, each of processors 210 performs the same application functions.
Sprayer 220 receives packets and load balances them across processors 210. Processors 210 receive the packets, process them, and send them to desprayer 230. Because the processing time for processing packets by processors 210 may vary, the packets may become out of order relative to the order in which they were received by sprayer 220. As a result, desprayer 230 reorders the packets to the order in which they were received by sprayer 220.
This parallel processing architecture is less scalable because the functions of sprayer 220 and desprayer 230 become increasingly harder to build as the number of processors 210 increases. Also, a lot of physical connections are required to connect sprayer 220 and desprayer 230 to processors 210, making it difficult to design and build. As a result, the parallel architecture has limited performance.
Accordingly, there is a need for a scalable packet processing architecture that can flexibly connect multiple processors while supporting a dynamic set of applications and features.