Network devices, such as switches and routers, are designed to forward network traffic, in the form of packets, at high line rates. One of the most important considerations for handling network traffic is packet throughput. To accomplish this, special-purpose processors known as network processors have been developed to efficiently process very large numbers of packets per second. In order to process a packet, the network processor (and/or network equipment employing the network processor) needs to extract data from the packet header indicating the destination of the packet, class of service, etc., store the payload data in memory, perform packet classification and queuing operations, determine the next hop for the packet, select and appropriate network port via which to forward the packet, etc. These operations are collectively referred to as “packet processing.”
Modern network processors perform packet processing using multiple multi-threaded processing elements (referred to as microengines in network processors manufactured by Intel® Corporation, Santa Clara, Calif.), wherein each thread performs a specific task or set of tasks in a pipelined architecture. During packet processing, numerous accesses are performed to move data between various shared resources coupled to and/or provided by a network processor. For example, network processors commonly store packet metadata and the like in static random access memory (SRAM) stores, while storing packets (or packet payload data) in dynamic random access memory (DRAM)-based stores. In addition, a network processor may be coupled to cryptographic processors, hash units, general-purpose processors, and expansion buses, such as the PCI (peripheral component interconnect) and PCI Express bus.
Each of the shared resources is connected to the network processor by some type of bus or buses. In general, conventional buses includes address lines, data lines, and command lines. In some cases, a particular type of resource may employ a dedicated bus. In other cases, multiple shared resources may be tied to the same bus.
In order to support concurrent access to shared resources, the network processor employs a bus management scheme. There are several types of arbitration situations. Under one situation, one or more data transaction requesters (e.g., microengine threads) may request access to a particular resource via a dedicated bus. Under another situation, multiple requesters request access to different shared resources coupled to a common bus. This later situation may prove particularly difficult to perform bus management in an efficient manner.
Ideally, all data requests would be similar in size, and all data transaction cycles would consume similar latencies. This would support a higher degree of pipeline synchronization, enabling the resource access latencies to be more easily hidden. Unfortunately, the sizes of data requests may vary significantly. Depending on the particular resource and application, data transactions may comprise short burst, long bursts, or a mixture thereof. Thus, shared resources sometimes need to support data transactions having variable lengths. This is particularly true when considering a network device as a whole, wherein some of the shared resources typically see short burst transactions, while other shared resources see longer burst transactions. In contrast, a single fixed pipeline architecture will not handle both short and long burst transactions in an efficient manner at the same time; pipeline architectures designed for short burst transactions are inefficient for handling long burst transactions, while pipeline architectures designed for long burst transactions are inefficient for handling short burst transactions.