A generalized communication system transmits data from one or more input ports to one or more output ports. One unit of transmitted data is often called a datagram or a packet. A datagram/packet conceptually travels as one unit through the communication system, though the communication system may actually further segment that datagram/packet then reassemble the datagram/packet within itself before outputting it.
In a generalized communication system, it is possible that output ports are oversubscribed, that is, more data is intended for an output port than that output port can output at a given time. Over-subscription can be due to transient traffic patterns where data arrives at the same time but the port is not fundamentally oversubscribed, or persistent traffic patterns where the port is fundamentally oversubscribed.
If an output port is never oversubscribed, there is never the case when a decision needs to be made about what data to send next out the output port, since there will either be one data to send or no packet to send at any given time. If a system only sees such traffic patterns, and if traffic is always allowed to traverse an output port if there is bandwidth available, there is no need for output queuing structures.
Traffic patterns that do potentially oversubscribe output ports, however, are not uncommon. Communication systems, such as Internet routers, are one class of system that routinely experience transient and/or persistent oversubscription. Systems that do see oversubscription generally handle oversubscription using one or more of the following techniques: (i) buffering, that is to provide memory to store the extra packets until they can be transmitted (ii) back-pressure, that is to disallow additional traffic to enter the oversubscribed component until there is additional available buffering/bandwidth and (iii) dropping, that is to discard the packet and thus not transmit it. Buffering generally appears to be one or more in-order queues. Buffering provides elasticity to smooth out transient oversubscription, but does not solve persistent oversubscription. Since buffers are limited in size, persistent oversubscription will result in either back-pressure or dropping. Persistent backpressure will generally result in cascaded backpressure, buffering further upstream, generally at the input ports, or dropping.
When an output port is oversubscribed, some data must wait while other data is forwarded. A scheduling algorithm specifies which of the available packets will be sent, while a buffering scheme specifies how packets are buffered and/or discarded as they arrive. Generally a scheduling algorithm and the buffering scheme are related and depend on each other.
Packets traversing Internet routers can generally be dropped if necessary. In some communication systems, such as high-performance parallel computers, dropping packets is unacceptable and results in an error condition. Thus, the specific communication system and the application space in which it is used restricts the type of scheduling and buffering policies that are used.
Handling oversubscription in a fair fashion is non-trivial. Ideally, the output port can accept the maximum oversubscription to allow the output port to see all of the traffic that wants to exit via the output port. In that case, the scheduling algorithm implemented at the output port can make the most intelligent decisions. Backpressure in general does not allow some of the traffic to reach the output port when it should, making the scheduling algorithm operate with incomplete information and potentially forcing drops further upstream where there is less information about the packets trying to traverse the output port. Thus, backpressure can introduce imprecision into the system. Being able to selectively backpressure certain classes of service and not backpressure other classes of service is one possible solution to this problem.
There are many scheduling algorithms for deciding what packets to forward and what packets to delay. A simple scheduling algorithm is first-come-first-serve. Data that arrives first is forwarded first. In such a scheme, only a single queue per output port is required to provide buffering in the case of transient oversubscription. If the queue fills up and there is no space in the queue when a packet arrives, the packet must be dropped or backpressure asserted.
Another scheduling algorithm is strict priority. In this algorithm, queues are assigned a strict priority relative to the other queues. If packets are available in queue A and queue B, where queue A is higher priority than queue B, packets will be taken only from queue A until either packets are available on queue C with higher priority than queue A or queue A has no more packets. In the former case, packets will be drawn from queue C while in the latter case, packets will be drawn from queue B. Strict priority schemes often allow for multiple queues to share the same priority. In those cases, some additional scheduling algorithm is required to arbitrate between the multiple queues in the same priority.
A more complex scheduling algorithm found in Internet routers is weighted-fair-queuing (WFQ), where there are several queues and a weight that specifies a certain fraction of the total bandwidth is assigned to each queue. Packets are assigned to a specific queue based on some characteristic such as the priority of that packet, the input port it arrived on, and so on. When there is no oversubscription, any queue can use as much bandwidth as it needs. When there is oversubscription, however, every queue is allowed to consume its assigned fraction of bandwidth before any excess is divided proportionally between the queues that have additional bandwidth needs beyond their allocated fraction of bandwidth.
Different scheduling algorithms can be combined. For example, the DiffServ specification combines two strict priority queues with a six queue WFQ group. Packets in the top priority queue are always transmitted before data in the second priority queue. Packets in the second priority queue are always transmitted before data in the six WFQ group queues. Packets in the six WFQ group queues are sent according to the WFQ weights assigned to each of the queues.
Using DiffServ, or something like it, Internet routers provide different levels of service for different types of packets. For example, voice packets generally require low latency and are thus categorized as high-priority packets. On the other hand, best effort packets have less stringent latency requirements and are thus prioritized behind voice packets.
The scheduling algorithms commonly used today make their decisions based on queues that encode priority and/or class. A priority or class could have a strict priority relative to the other queues, a guaranteed proportion of the total bandwidth, a fixed amount of guaranteed bandwidth and so on.
Weighted Random Early Discard (WRED) is a well known buffering method that allows different classes of packets with different throughput and latency requirements to share the same queue. For each class of packet, WRED provides three parameters, minThreshold, maxThreshold and a slope. When the queue size is less than the minThreshold, packets of that class are enqueued. When the queue size is greater than minThreshold, packets are dropped at a probability defined by a line starting at the minThreshold with the specified slope. Thus, as the queue size grows, the probability of a packet being dropped increases. If the queue size is maxThreshold or deeper, all packets of that class are dropped.
Clearly, higher priority packets will have some combination of higher minThresholds and maxThresholds and lower slopes than lower priority packets. WRED with appropriate set parameters ensures that higher priority packets are treated relatively better than lower priority packets. What it does not ensure, however, is absolute performance. Generally, a WRED-protected queue is one of many queues. Activity in the other queues can affect the drain rate of a queue. Thus, given a specific set of WRED parameters, packet latency can vary widely, up to the dynamic range of the queue's drain rate which could potentially be multiple orders of magnitude. Such range of latency can make equipment that only use WRED for queue admission unacceptable for latency-sensitive traffic such as voice or gaming.