Bonding and aggregation technologies enable two or more network devices to send and receive data packets across multiple communication links. Bonding or aggregation technologies typically use round-robin scheduling to send and receive data packets across all the lower links combined. In a round-robin algorithm, a load balancer assigns packet requests to a list of the links on a rotating basis. For the subsequent requests, the load balancer follows the circular order to redirect the request. Once a link is assigned a request, the link is moved to the end of the list. This keeps the links equally assigned.
Round-robin scheduling results in throughput degradation when the performance of the bonded or aggregated lines is substantially different. When these networks are combined the bonded lines inherit the latency of the worst case latency. Similarly, the transfer rate is governed by the slowest bonded line.
This interferes with the ability to provide a consistent, even, and optimum flow of bandwidth over multiple communication links, especially with real-time applications such as VoIP, which are highly sensitive to jitter and varying latency across the multiple lower links.
This less than optimal performance is explained by the fact that when relying on lower level communication links, each lower link may have dissimilar characteristics including asymmetrical speeds and latency variation. The speed of the aggregate throughput of a bonded or aggregated communication session is only as good as the least common speed multiplied by the number of lower links. This results in an inefficient aggregation or bonding that does not make optimal use of bandwidth that is available in combination.
Similar problems are experienced in fast failover techniques, where performance upon failing over to aggregated or bonded connections involving lower level links for example, is less than optimal.
The above issues are further aggravated by the fact that diverse carriers may have varying latency characteristics even when using similar access methods. Carrier diversity may entail the combining of legacy symmetrical circuits with newer asymmetrical type broadband circuits, creating a mix of faster and slower speeds in either direction, with varying latencies for all lower links combined.
For example, when two lower communication links are balanced (i.e. of the same speed), round-robin distribution typically results in full use of all the available bandwidth of the lower links combined. When one of the two lower links is unbalanced, however, round-robin distribution typically results in lower performance than the lower links combined. When using three lower links each with different speed and latency, round-robin distribution results in very poor performance and is practically unusable in many applications.
Distribution algorithms have been proposed for addressing these issues. For example, a weighted round-robin allocation has been proposed. Weighted round-robin is an advanced version of the round-robin that eliminates some of its deficiencies. In case of a weighted round-robin, one can assign a weight to each link in the group so that if one link is capable of handling twice as much load as the other, the larger link gets a weight of 2. In such cases, the load balancer will assign two requests to the larger link for each request assigned to the smaller one. U.S. Pat. Nos. 6,438,135 and 7,580,355, meanwhile, disclose dynamic approaches to the weighted round robin algorithm.
In addition to performance degradation, network congestion also presents a problem. Network congestion occurs when a network connection is overwhelmed by the data being transmitted over the connection. This results in quality of service (QoS) deterioration which is generally experienced as queuing delays, packet loss or the inability to process new connections.
Network congestion avoidance is the process used in networks to avoid congestion. Congestion in a network causes degradation of all services running across the network as all available capacity is consumed. This can occur due to a single network application consuming all available capacity. This affects latency and time sensitive applications such as voice, video streaming, etc.
To compensate and/or avoid the occurrence of congestion within a network link, queuing mechanisms are used to ensure that the available capacity is fairly distributed among all consumers of the link. There are a few commonly implemented queuing mechanisms, including: first in first out (FIFO), weighted fair queuing (WFQ), custom queuing (CQ), priority queuing (PQ). All these mechanisms manage traffic as it is received from the transmitter and before it is transmitted on through the interface.
Other queuing mechanisms include tail drop, random early drop (RED), weighted random early detection (WRED), and Blue. The most common form of protocol independent rate-limiting is performed by discarding excess packets using congestion management mechanisms such as tail drop. Other methods use packet queuing, adding delays to packets in transit or protocol specific built-in congestion control mechanisms that are typically not supported by most real time applications. The use of congestion management mechanisms such as tail drop to rate-limit bandwidth usage results in high jitter and packet loss, degrading the quality of real-time applications. This implementation cannot be used for real-time applications on low cost access solutions. The problem is further compounded when bandwidth usage approaches the upper threshold as the latency and loss rises exponentially compared to bandwidth usage. FIG. 5 illustrates latency relative to bandwidth in the prior art. It can be shown that latency starts to peak dramatically at a particular bandwidth usage.
Tail drop (also referred to as drop tail) is a simple queue management algorithm in which traffic is not differentiated. Tail drop allows a queue to fill to its maximum capacity and then drops new packets as they arrive until the queue has additional capacity. Tail drop differs from the previously mentioned mechanisms since it allows a queue to fill before taking any action while the others are more pro-active in queue management.
One disadvantage of tail drop is that on a network where a large volume of data is being transmitted, real time applications could suffer as the data may easily fill up the queue causing voice packets to be dropped.
Furthermore, in particular applications, such as VOIP (SIP), signaling traffic and RTP packets that contain the call audio could be dropped. Signaling traffic can be re-transmitted, however the timer for retransmits is about 500 ms for SIP and if critical packets within a SIP conversation are not acknowledged, the call will drop. RTP packets, meanwhile, are transmitted using UDP. This effectively causes dropped packets to be lost. Although the implementations of packet loss concealment (PLC) can mask some of the effects of packet loss in VOIP, large numbers of dropped calls affect call quality.
Active queue management mechanisms, meanwhile, implement mechanisms to alleviate some of the issues of tail drop by decreasing the number of dropped packets, increasing the utilization of links by decreasing the triggering of congestion control mechanisms within TCP conversations, lowering the queue size and decreasing the delays and jitter seen by flows and attempting to share the connection bandwidth equally among the various flows. Active queue management algorithms include RED, WRED and Blue.
Another disadvantage of tail drop is that can cause consumers of a particular network link to enter a slow-start state (which reduces data throughput) and even cause global synchronization often enough that the effect is deleterious to network throughput. While RED, WRED and Blue avoid the issue, RED and WRED are generally applicable to IP-only networks due to their dependence on the use of mechanisms built into TCP and the fact that packets are dropped rather than queued.
RED monitors the average queue size and drops packets based on statistical probabilities. It may also mark packets with explicit congestion notification (ECN). However, ECN is supported only by TCP/IP which makes it unfavorable as a mechanism for use with UDP based flows. Also, while ECN is present in most current TCP/IP protocol suites, they are generally shipped with it disabled.
WRED extends RED by providing several different queue thresholds based on the associated IP precedence or DSCP value. This allows lower priority packets to be dropped protecting higher priority packets in the same queue if a queue fills up. However, WRED also works only with TCP-based conversations. Other protocols such as IPX do not use the concept of a sliding window. When faced with a packet discard, these protocols simply retransmit at the same rate as before. RED and WRED are inefficient in a network utilizing non-TCP protocols.
A Blue queue maintains a drop/mark probability, and drops/marks packets with probability as they enter the queue. Whenever the queue overflows, drop/mark probability is increased by a small constant, and whenever the queue is empty, drop/mark probability is decreased by a constant which is less than the small constant used to increase probability. The main flaw of Blue, which it shares with most single-queue queuing disciplines, is that it doesn't distinguish between flows, and treats all flows as a single aggregate. Therefore, a single aggressive flow can push out of the queue packets belonging to other, better behaved flows.
Packet loss is a side effect of congestion which needs to be avoided when running a network that handles real-time traffic, such as VOIP or streaming video, as they are sensitive to packet loss. The main flaw of Blue, which it shares with most single-queue queuing disciplines, is that it does not distinguish between flows, and treats all flows as a single aggregate. Therefore, a single aggressive flow can push out of the queue packets belonging to other, better behaved flows.
Therefore, what is required is a queuing and distribution algorithm that uses bidirectional information to support asymmetrical environments and leverages the bandwidth of bonded or aggregated network connections, even where the links in the connection exhibit substantially different performance.