The present invention relates to communication networks. In particular, and not by way of limitation, the present invention is directed to a method of controlling packet flow in a packet-switched communication network.
Buffers exist in network devices to absorb traffic bursts. When the input traffic flow to the device is larger than the output flow, a queue builds up in the buffer. When the input flow is smaller than the output flow, the queue drains. Therefore, the device can handle a variable input load that is larger than the device's output capacity for short periods of time. However, if the load mismatch lasts too long, the queue fills up and a passive manager starts dropping arriving packets that do not fit in the queue. This behavior is called “tail-drop” and it has some major problems associated with it.
A first major problem with tail-drop is related to the way the Transmission Control Protocol (TCP) handles congestion avoidance. When experiencing a packet loss, TCP reduces its congestion window size, thus reducing the packet transmission rate. Once packet transmissions are acknowledged, TCP increases the transmission rate again. During congestion, tail-drop drops packets from multiple TCP flows, thereby causing multiple TCP flows to reduce their transmission rates at the same time. Once tail-drop stops, the multiple TCP flows receive acknowledgments for their packet transmissions and increase their transmission rates at the same time. This phenomenon is called “TCP Global Synchronization” and results in bursty, “on-off” traffic that alternately causes congestion or network under utilization.
Another major problem associated with tail-drop is queuing delay. Under some conditions, queues get full or nearly full and stay that way for some time. This results in all packets experiencing a large delay, even if the traffic input rate does not exceed the output rate. Thus, waiting until a queue is full before dropping packets contributes negatively to queuing delay.
Tail drop also contributes to unfairness in the network. Passive queues result in bursty, high-bandwidth-consuming traffic. Such traffic is allowed to fill up the queue and once the queue is full, all incoming traffic is punished equally. This is especially harmful to smooth, low-bandwidth traffic that would otherwise not experience any packet loss.
More packets are dropped if the queue is full or nearly full because the size of available buffering to absorb traffic bursts is smaller. Most dropped packets have to be retransmitted, thereby reducing the throughput of useful traffic or “goodput” of the network. During tail-drop, a single flow may encounter multiple sequential packet losses. The TCP algorithm does not recover well from such losses. Also TCP global synchronization can lead to buffer underflow and reduced goodput and throughput in general.
Active Queue Management (AQM) is a mechanism that tries to address these problems by actively managing the queues, which usually means dropping packets before the queue is full. The challenge for AQM lies in choosing which packets to drop, how many packets to drop, and at what time. Random Early Detection (RED) is one of the most commonly deployed and researched AQM algorithms, and is recommended by the Internet Engineering Task Force (IETF) in RFC2309. The RED algorithm calculates an average queue size using an Exponential Weighted Moving Average (EWMA) low-pass filter. The average queue size is compared to a minimum and a maximum threshold. If the average queue size is less than the minimum, no packets are dropped; if the average queue size is more than the maximum, all packets are dropped; if the average queue size is in the range between the minimum and the maximum, packets are dropped randomly according to a drop probability. The drop probability calculation is a function of the average queue size. The drop probability grows from zero to a maximum limit and is directly proportional to the average queue size.
There have been a number of other AQM algorithms suggested, such as Flow RED (FRED), BLUE, and Stabilized RED (SRED). In general, all AQM algorithms view the network as a closed-loop system and try to implement a closed-loop regulator to control the traffic load and reduce congestion. Such a system uses “monitors” or “sensors” to obtain information about specific conditions in the system. These signals are either used as is, or they are “smoothed out” or “conditioned” to remove unwanted noise from the useful information. For sake of simplicity, “monitor” is used herein to refer to these signals, either conditioned or not. The monitors are used as input to a model, which can be ether explicit or implicit. An explicit model supplied with the monitored inputs could, at least theoretically, produce a simulated estimation of future network conditions. This estimation feeds into a controller that applies congestion avoidance (e.g., dropped packets), according to some algorithm or “control law” if the estimation indicates congestion in the future. In an implicit model, the monitors may feed directly into the controller, in which case the model is implicit in the control law itself.
There are several major problems with this approach. First, the monitors, either used as they are or smoothed, do not clearly and consistently distinguish between congestion buildup in the network that needs to be controlled or a transient traffic burst that should be left alone. Second, an accurate explicit model of Internet traffic does not exist today, and the implicit models in the control law are not good enough to estimate potential congestion buildup in the future, given the complex nature of Internet traffic patterns.
Third, the controllers have an objective of estimating congestion, but they have neither the objective nor the means to measure sub-optimal network utilization. Although the existence of congestion, or lack of it, serves as a feedback to the controller regarding the question of whether enough control is being applied (i.e., whether enough packets are being dropped), there is no feedback to the controller regarding the question of whether too much control is being applied (i.e., too many packets are being dropped) resulting in under utilization of the network. Finally, the task of choosing optimal algorithm parameter values is left to network administrators who, in turn, usually use a recommended default value. The problem is that different networks need different settings, and the same default value does not fit all networks. Furthermore, network conditions and topology tend to change over time, so even if the original settings were optimal, they may not be later on.