As the network protocol stack in current data communication network has become more and more complicated, the design and architecture of network processors/systems on-chip have advanced dramatically to support processing and transferring multiple flows of packets at the line rate of up to hundreds of Gbps nowadays.
With the emerging of software defined networks (SDN), more new protocols and standards are expected to added to network devices in the near future. As a result, more processing engines are added into network processors/systems on chip so that they can process multiple network tasks in parallel. Parallelism can be performed not only by multiple parallel packet flows in a network-processing system on chip but also by multiple task parallelism per flow. One processing engine can handle one or multiple tasks; and one task can also be mapped to one or a few processing engines.
As the number of processing engines becomes large (tens or hundreds), an on-chip network connecting these processing engines is needed. To avoid wiring complexity and congestion, the on-chip network consists of multiple interconnect elements connected in a regular topology such as mesh or tree. Each interconnect element directly connects one or a few processing engines. In literature, the on-chip interconnect element is typically named on-chip router or on-chip switch. Processing engines on a system on-chip communicate each other indirectly by sending packets through a network built from interconnect elements. Interconnect elements forward packets based on the destination processing engine address embedded in each packet.
Processing engines in a network system on chip are programmable so that the system can adapt with network feature changes by users. As a result, the on-chip communication network connecting these processing engines is also programmable and flexible for supporting the packet flow changes among processing engines.
One of the main tasks of each processing engine in a network processor is performing lookups for packets. Lookup operations are determined by the network protocols and features programmed to that processing engine. The lookup results are used for performing more actions on the packet such as modifying some fields in the packet and/or forwarding packet to the correct destination port.
Traditionally, processing engines perform lookups on a remote search engine which contains a large number of lookup tables shared by all processing engines in the entire system. A shared central search engine is easy to manage, but it has some drawbacks. First, the wirings from all processing engines to the search engine are highly complicated which consume high power and reduce overall silicon utilization. Second, round-trip latency from when a processing engine sends a lookup request to the search engine to when it receives the lookup result is high. Third, configuring the shared search engine to achieve high lookup bandwidths for all processing engines is difficult.
As an effort to reduce the lookup latency and increase the lookup bandwidth, parts of look tables are physically moved from the search engine to inside the processing engines. With this design movement, each processing engine is able to perform lookups on its internal lookup tables more frequently so that achieves low latency and high bandwidth lookups. Only complicated lookups, which require more computation power and large memory capacity, are performed on the shared remote search engine. This approach, however, also has its own disadvantages. Because the capacity of lookup tables in each processing engine is hardware-fixed, the user cannot allocate bigger tables for the processing engine; hence the processing engine is forced to access the remote search engine eventually. At another end, when the user only needs a small lookup table for a processing engine, the remaining memory capacity in that processing engine is wasted meaning it has low memory resource utilization.
The foregoing examples of the related art and limitations related therewith are intended to be illustrative and not exclusive. Other limitations of the related art will become apparent upon a reading of the specification and a study of the drawings.