The communications industry is rapidly changing to adjust to emerging technologies and ever increasing customer demand. This customer demand for new applications and increased performance of existing applications is driving communications network and system providers to employ networks and systems having greater speed and capacity (e.g., greater bandwidth). In trying to achieve these goals, a common approach taken by many communications providers is to use packet switching technology. Increasingly, public and private communications networks are being built and expanded using various packet technologies, such as Internet Protocol (IP). Note, nothing described or referenced in this document is admitted as prior art to this application unless explicitly so stated.
A network device, such as a switch or router, typically receives, processes, and forwards or discards packets. For example, an enqueuing component of such a device receives streams of various sized packets which are accumulated in an input buffer. Each packet is analyzed, and an appropriate amount of memory space is allocated to store the packet. The packet is stored in memory, while certain attributes (e.g., destination information and other information typically derived from a packet header or other source) are typically maintained in a separate memory. Once the entire packet is written into memory, the packet becomes eligible for processing, and an indicator (e.g., a packet handle) of the packet is typically placed in an appropriate destination queue for being serviced according to some scheduling methodology for packet processing. When this packet processing is complete, the packet is then gathered for sending (e.g., another processing function to build the processed packet to be forwarded based on the packet handle). The packet is then forwarded and the memory previously required for storing the sent packet becomes available for storing new information.
A packet processing mechanism receives multiple channels/streams of packets (typically in parallel), processes the packets (typically in parallel), and then gathers (e.g., builds) and forwards the processed packets (often serially, one packet at a time).
In one prior approach, each channel is allocated its own portion of memory for storing packets. This approach typically requires a substantial amount of memory, often with its overall occupancy level being low. For example, if a small amount of memory is allocated for each channel, this memory may be consumed by packets being processed or waiting to be gathered and forwarded. Thus, incoming packets are either dropped or back pressure (e.g., flow control) is used to stop the incoming flow of packets. For illustrative purposes, assume the memory for all channels is consumed, thus no more packets can be received. After a packet is sent, the memory used to store the sent packet becomes available. In this approach with memory dedicated to individual channels, then the channel corresponding to the sent packet is able to receive a new packet, but packets cannot be received by other channels. This problem is exacerbated when packets are received over multiple channels and packets are sent out over a single channel. One prior approach to avoid this problem is to provide enough (and typically a large amount of) memory for each channel.
Another approach reduces the amount of overall memory required by sharing memory for storing packets received over multiple channels. This presents the issue of how to share this resource (e.g., memory).
When there is a contention for resources, such as on output links of a packet switching system or interface or even for memory or compute cycles, it is important for resources to be allocated or scheduled according to some priority and/or fairness policy. Moreover, the amount of work required to schedule and to enqueue and dequeue a packet or other scheduled item is important, especially as the operating rate of systems increase. Many different mechanisms are available to share resources, and many of which are described hereinafter.
Ordinary time division multiplexing (TDM) is a method commonly used for sharing a common resource between several clients. All scheduled clients are served one at a time at predetermined times and for pre-allocated time periods, which is a very useful property for many applications. This method is often used for multiplexing multiple synchronous items over a higher speed communications link, such as that used for multiplexing multiple telephone calls over a single facility or interleaving packets. However, in a dynamic environment wherein items may not require the full amount of their allocated time slot such as when an item may only require none or only a portion of a particular allocated time slot, then bandwidth of the resource is typically wasted.
Ordinary round-robin (RR) is another method commonly used for sharing a common resource between several clients. All clients are served in a cyclic order. In each round every client will be served if it is eligible. When served, each client is permitted to send one packet. Servicing of queues is simple to implement and can be done in constant time, but, due to the varying size of packets, does not allocate bandwidth fairly. For example, certain higher priority or larger bandwidth ports or streams of packets may not get their desired amount of bandwidth, which may especially be the case when serving one large and numerous smaller traffic streams or when different priorities of traffic are scheduled.
In some scenarios, high priority (e.g., low latency), guaranteed bandwidth, best effort traffic (e.g., spare bandwidth) and other classifications of traffic compete for a common resource. Various known scheduling methods are designed to provide isolation, prioritization, and fair bandwidth allocation to traffic competing for a common resource. These are known as fair queuing methods. Some examples are Weighted Fair Queuing (WFQ), Self-Clocked Fair Queuing (SCFQ), and Deficit Round Robin/Surplus Round Robin (referred to as DRR).
WFQ and SCFQ depend upon arrival times as well as previous link utilization in order to calculate the next best packet to send. The accepted “ideal” behavior is bit-by-bit or weighted bit-by-bit round robin which assigns each bit of each packet in the system an ideal finish time according to the weighted fair sharing of the system. This is typically not practical in a packet-based system unless all packets are one bit. Generalizing the algorithm from bit-by-bit to packet-by-packet, each packet is assigned an ideal finish (departure) time and the packets are served in order of the earliest departure time. The inclusion of theoretical departure times in a scheduling method typically requires insertion into a sorted list which is known to be an O(log N) problem implemented in software, where N is typically the number of queues. In hardware, this problem may be reduced to an O(1) operation with O(log N) resources.
DRR is a method used for sharing a common resource between several clients with different ratios between clients (i.e., some clients are allowed to consume more of the resources than others). The ratio between clients is typically defined by a parameter called a quantum. There are many variations and different implementations of DRR, including that described hereinafter.
DRR services queues using round-robin servicing with a quantum assigned to each queue. Unlike traditional round-robin, multiple packets up to the specified quantum can be sent resulting in each queue sending at least a quantum's worth of bytes. If the quantum for each queue is equal, then each queue will consume an equal amount of bandwidth.
This DRR approach works in rounds, where a round is one round-robin iteration over the queues that have items to be sent. Typically, when the queue is scheduled, it is allowed to transmit until its deficit becomes negative (or non-positive), and then the next queue is served. Packets coming in on different flows are stored in different queues. Each round, each queue is allocated a quantum worth of bytes, which are added to the deficit of each queue. Each queue is allowed to send out one or more packets in a DRR round, with the exact number of packets being sent in a round being dependent on its quantum and the size of the packets being sent. Typically, as long as the deficit is a positive (or non-negative) value (i.e., it is authorized to send a packet) in a DRR round for a queue and it has one or more packets to send, a packet is sent and its deficit is reduced based on the size of the sent packet. If there are no more packets in a queue after the queue has been serviced, one implementation sets the deficit corresponding to the queue to zero, while one implementation does this only if its deficit is negative. Otherwise, the remaining amount (i.e., the deficit minus the number of bytes sent) is maintained for the next DRR round.
DRR has a complexity of O(1)—that is the amount of work required is a constant and independent of the number of packets enqueued. In order to be work conserving, a packet should be sent every time a queue is scheduled no matter its size. Thus, the quantum used in DRR should be at least one maximum packet size (MTU) to guarantee that when the quantum is added to any deficit, the resulting value is at least zero. DRR provides fair bandwidth allocation and is easy to implement. It is work conserving and, because of its O(1) properties, it scales well with higher link speeds and larger number of queues. However, its scheduling behavior deviates quite a bit from the bit-by-bit round robin “ideal.” In particular, latency for a system with N queues is Q*N where Q is the average quantum, which must be at least one maximum transmission unit (MTU). However, when a schedule entry is used to schedule multiple rates and/or types of traffic, multiple deficits are typically needed, and a significant amount of work may be required to update multiple deficits in response to sending of a packet, for example.