Network devices, such as switches and routers, are designed to forward network traffic, in the form of packets, at high line rates. One of the most important considerations for handling network traffic is packet throughput. To accomplish this, special-purpose processors known as network processors have been developed to efficiently process very large numbers of packets per second. In order to process a packet, the network processor (and/or network equipment employing the network processor) needs to extract data from the packet header indicating the destination of the packet, class of service, etc., store the payload data in memory, perform packet classification and queuing operations, determine the next hop for the packet, select an appropriate network port via which to forward the packet, etc. These operations are collectively referred to as “packet processing.”
Modern network processors perform packet processing using multiple multi-threaded processing elements (referred to as microengines in network processors manufactured by Intel® Corporation, Santa Clara, Calif.), wherein each thread performs a specific task or set of tasks in a pipelined architecture. During packet processing, numerous accesses are performed to move data between various shared resources coupled to and/or provided by a network processor. For example, network processors commonly store packet metadata and the like in static random access memory (SRAM) stores, while storing packets (or packet payload data) in dynamic random access memory (DRAM)-based stores. In addition, a network processor may be coupled to cryptographic processors, hash units, general-purpose processors, and expansion buses, such as the PCI (peripheral component interconnect) and PCI Express bus.
In general, the various processing elements network processor, as well as other optional components, will share access to various system resources. Such shared resources typically include data storage and processing units, such as memory stores (e.g., SRAM, DRAM), UARTs, input/output (I/O) interfaces etc. The shared resources and their consumers are interconnected via sets of buses that are shared by the various processing elements and other bus masters.
Under typical network processor configurations, various bus schemes are employed to support access to the shared resources. Since only a single set of signals can be present on a given bus at any point in time, buses require multiplexing and the like to allow multiple resource consumers to access multiple resource targets coupled to the bus. In order to enable access by all consumers, a bus arbitration scheme must be employed, such that when multiple access requests are submitted concurrently, one of those requests is granted, while the other requests are denied.
In accordance with one conventional technique, bus access is supported in the following manner. Multiple bus masters, such as processors, DMA (direct memory access) controllers, and the like are coupled to a common bus with a fixed width, such as 32-bits or 64-bits. During a bus cycle, one or more masters will submit a request (e.g., asserts a request signal) to a bus arbiter to access the bus. In this case of multiple requests occurring during the same cycle, the arbiter will apply an arbitration policy, such as round-robin, to determine which master to grant the bus access to. In response to receiving an access grant, the master will drive out an address of a targeted slave on an address bus, which will be sampled by all of the slaves tied to the bus. The targeted slave will recognize that the access request is for that slave, while the other slaves will ignore the request. Following this address and control set-up sequence, one or more bus cycles are employed for transferring the data between the master and the slave. For data reads, data is transferred from a slave to a master. For data writes, data is transferred from a master to a slave.
Under conventional practices, only one transfer may be present on a shared data bus at one time. While this simplifies arbiter and control logic, it limits the amount of throughput that the bus may support.