Network devices, such as switches and routers, are designed to forward network traffic, in the form of packets, at high line rates. One of the most important considerations for handling network traffic is packet throughput. To accomplish this, special-purpose processors known as network processors have been developed to efficiently process very large numbers of packets per second. In order to process a packet, the network processor (and/or network equipment employing the network processor) needs to extract data from the packet header indicating the destination of the packet, class of service, etc., store the payload data in memory, perform packet classification and queuing operations, determine the next hop for the packet, select an appropriate network port via which to forward the packet, etc. These operations are collectively referred to as “packet processing.”
Modern network processors perform packet processing using multiple multi-threaded processing elements (referred to as microengines in network processors manufactured by Intel® Corporation, Santa Clara, Calif.), wherein each thread performs a specific task or set of tasks in a pipelined architecture. During packet processing, numerous accesses are performed to move data between various shared resources coupled to and/or provided by a network processor. For example, network processors commonly store packet metadata and the like in static random access memory (SRAM) stores, while storing packets (or packet payload data) in dynamic random access memory (DRAM)-based stores. In addition, a network processor may be coupled to cryptographic processors, hash units, general-purpose processors, and expansion buses, such as the PCI (peripheral component interconnect) and PCI Express bus.
In general, the various packet-processing elements (e.g., microengines) of a network processor, as well as other optional processing elements, such as general-purpose processors, will share access to various system resources. Such shared resources typically include data storage and processing units, such as memory stores (e.g., SRAM, DRAM), hash units, cryptography units, etc., and input/output (I/O) interfaces. The shared resources and their consumers are interconnected via sets of buses known as the “chassis.” The chassis is a high-performance interconnect on the network processor chip that provides the on-chip data transport infrastructure between numerous processing elements on the chip and the numerous shared resources on-chip or accessible via appropriate built-in chip interfaces.
Under typical network processor configurations, various bus schemes are employed to enable shared access to the shared resources. Since only a single set of signals can be present on a given bus at any point in time, buses require multiplexing and the like to allow multiple resource consumers to access multiple resource targets coupled to the bus. In order to support concurrent access to shared resources, the network processor must arbitrate grants to its buses. There are several types of arbitration situations. Under one situation, one or more data transaction requesters (e.g., microengine threads) may request access to a particular resource accessed via a dedicated bus. Under another situation, multiple requesters request access to different shared resources coupled to a common bus. This latter situation may prove particularly difficult to perform bus management in an efficient manner.
One technique for relieving access contention is to provide separate buses for data reads and data writes for each shared resource. However, implementing separate buses for reads and writes for each target increases the bus count, and thus adds to the already crowded signal routing requirements for the network processor chip. Consider, under a conventional approach, sharing access to 16 shared resources requires 16 independent sets of buses, with each set of buses including a read bus, a write bus, and a command bus for a total of 48 buses. To support routing for such a large number of buses, dies sizes must be increased; this directly conflicts with the goal of reducing dies sizes and processor costs.