Processing engines that include arrays of processing elements are used in network devices such as routers and switches to increase the speed of packet processing. Operations performed by an array of processing elements (often referred to as a “systolic” array) include processing packet header information and using the processed information to find some other information in a look up table that is stored in memory. Examples of information that is obtained through the lookups include destination address, access control, policy information, rate control, traffic classification etc. The rate at which packet information can be processed through an array of processing elements sets the throughput of a processing engine and in turn the throughput of the network device.
Arrays of processing elements usually are divided into stages of processing elements, where the processing elements within each stage perform similar operations. Each stage of the array has a corresponding memory unit that stores a lookup table that is specific to the stage. Operations performed at each stage of the array include: processing packet header information to produce search information, sending the search information to the corresponding memory unit, performing a search, returning the results of the search back to the corresponding processing element, and then forwarding the packet header information and the search results to a next stage processing element in the systolic array. These operations are performed in a serial manner because the next stage processing is usually dependent on the results from the previous stage search.
Although advances in memory speed and search techniques have been made, the search operations are still slow in comparison to the processing speed of the processing elements. Because of the difference in processing speeds between the search operations and the packet processing, processing elements can sit idle while search operations are performed. As such, it is often the case that the search operations are the limiting factor in overall performance when using an array of processing elements.
In view of this, what is needed is a technique for more efficiently processing packet information using an array of processing elements.