Network devices, such as switches and routers, are designed to forward network traffic, in the form of packets, at high line rates. One of the most important considerations for handling network traffic is packet throughput. To accomplish this, special-purpose processors known as network processors have been developed to efficiently process very large numbers of packets per second. In order to process a packet, the network processor (and/or network equipment employing the network processor) needs to extract data from the packet header indicating the destination of the packet, class of service, etc., store the payload data in memory, perform packet classification and queuing operations, determine the next hop for the packet, select an appropriate network port via which to forward the packet, etc. These operations are generally referred to as “packet processing” or “packet forwarding” operations.
Modern network processors perform packet processing using multiple multi-threaded processing elements (e.g., processing cores) (referred to as microengines or compute engines in network processors manufactured by Intel® Corporation, Santa Clara, Calif.), wherein each thread performs a specific task or set of tasks in a pipelined architecture. During packet processing, numerous accesses are performed to move data between various shared resources coupled to and/or provided by a network processor. For example, network processors commonly store packet metadata and the like in static random access memory (SRAM) stores, while storing packets (or packet payload data) in external dynamic random access memory (DRAM)-based stores.
A typical network device configuration is shown in FIG. 1. The network device includes six network line cards 100, 102, 104, 106, 108, and 110, which are communicatively-coupled to one another via a primary fabric switch card 112 and a redundant switch card via a common backplane, mid-plane, or the like. For simplicity, each of line cards 100, 104, 108 and 110 are depicted as including a framer block 116 and an NPU 118, while each of line cards 102 and 106 including a PHY block 120 and an NPU 118. The framer blocks 116 perform de-framing operations on incoming packets and framing operations for outgoing packets. Similarly, PHY blocks 120 perform various physical layer operations pertaining to incoming and outgoing packet processing. In addition to these illustrated components, each of the network line cards will include other common components, such as SRAM stores, DRAM stores and various other packet-processing blocks that are well-known in the art.
The purpose of primary fabric switch card 112 (and redundant switch card 114 when activated to replace primary fabric switch card 112) is to provide selective connectivity between the various network line cards. Each of the network line cards includes (generally) one or more physical input/output (I/O) ports via which data may be received and/or transmitted. In view of routing aspects common to routers and switches, the switch fabric enables packets or the like received at a first I/O port to be selectively routed to any of the other I/O ports by selectively coupling appropriate line cards hosting the I/O ports. For example, a first packet flow A is shown as being received at line card 100, transmitted across fabric switch card 112 to line card 110 at time T1, and henceforth transmitted to a next hop in the route. Similarly, a second packet flow B is shown as being received at line card 108, transmitted across fabric switch card 112 to line card 104 at time T2.
Due to the switching flexibility of the switch fabric, connections between line card pairs to support corresponding flows are frequency switched on an ongoing basis, requiring a scheduling mechanism to be employed for managing access to the switch fabric. Accordingly, switch fabrics employ fabric interfaces such as SPI (System Packet Interface), CSIX (Common Switch Interface), NPSI (Network Processor Streaming Interface) and ASI (Asynchronous Serial Interface) to interface with the NPUs in order to coordinate and schedule traffic flows. These fabric interfaces support fine-grained QoS (Quality of Service) by supporting flow control on the interface on a per-queue basis. These queues are optionally referred to as virtual output queues (VOQ) or connection queues (CQ). The flow control on these queues change rapidly based on the congestion in the fabric due to traffic injected from the various line cards. The fabric conveys Xoff and Xon messages to the line cards to stop and start traffic on a per queue basis. The network processors on the line cards are required to respond to these messages and stop or start transmission instantly on a particular queue.
A network processor typically requires a long latency from the time a given queue is scheduled to the time the data is actually transmitted on the wire. This is to account for the latency of the various internal pipeline stages and the latency to read data from external DRAM memory. Since the fabric flow control status changes rapidly, the NPU transmit engine is required to check whether the scheduled queue is still valid for transmission. If the transmit engine encounters an Xoff message, the scheduled cell/segment must not be transmitted, since the flow control status for that particular queue has changed in the intervening time. Under such conditions the transmitter will discard all the scheduled cell/segment from that queue. As a result, the queue management engine is required to roll-back the queue to the point where the first dropped segment occurred. Under the conventional approach, this is a costly operation (in terms of overhead latencies in memory resource consumption).