Traffic Engineering (TE) is concerned with performance optimization of operational networks. In general, it encompasses the application of technology and scientific principles to the measurement, modeling, characterization, and control of Internet traffic, and the application of such knowledge and techniques to achieve specific performance objectives.
A major goal of Internet Traffic Engineering is to facilitate efficient and reliable network operations while simultaneously optimizing network resource utilization and traffic performance. Traffic Engineering has become an indispensable function in many large Autonomous Systems because of the high cost of network assets and the commercial and competitive nature of Internet ISPs. These factors emphasize the need for maximal operational efficiency. Inefficient resource utilization and congestion result when traffic streams are inefficiently mapped onto available resources, causing subsets of network resources to become overutilized while others remain underutilized. In general, congestion resulting from inefficient resource allocation can be reduced by adopting load balancing strategies. The objective of such strategies typically is to minimize maximum congestion or alternatively to minimize maximum resource utilization, through efficient resource allocation. When congestion is minimized through efficient resource allocation, packet loss decreases, transit delay decreases, and aggregate throughput increases. As a result, the perception of network service quality experienced by end users becomes significantly enhanced.
Traffic demand characteristics are a major factor affecting the design of traffic engineering algorithms, i.e., methods used to control the flow of traffic in a network. Unfortunately, for many ASes, although traffic demand can be relatively stable most of the time, there exist time periods during which traffic can be highly dynamic, containing unpredictable traffic spikes that ramp up extremely quickly, leaving no time for a traffic engineering algorithm to re-compute or adjust. We recently examined the traffic traces of several backbone networks and found that short time periods exist during which traffic demand can increase by at least one order of magnitude.
Highly unpredictable traffic variations have also been observed and studied recently by other researchers. To further confirm the likelihood of observing highly unpredictable traffic spikes in real-life, we queried the operators of some large ASes and received reports of highly unpredictable traffic patterns in their daily operations. Many factors contribute to the highly unpredictable nature of Internet traffic: outbreaks of worms/viruses, outages or routing changes of major ISPs, the occurrence of natural disasters, denial-of-service attacks, and flash-crowd effects due to major news events. For many cases, traffic spikes occur exactly when the networking service should be at its most valuable. In addition, with sources of adaptive traffic such as overlay networks on the rise and more and more networks adopting traffic engineering, volatility and variability in traffic could increase further.
It is important that traffic engineering handle sudden traffic spikes. If a traffic engineering algorithm is not prepared for them, it may pay a serious performance penalty, possibly leading to router overload and even crashes. Such crashes reduce network reliability and may violate increasingly stringent service level agreements (SLAs), leading to potential financial penalties.
The importance of traffic engineering has motivated many studies in the last few years, and quite a few traffic engineering methods (algorithms) have been proposed. Many of these traffic engineering algorithms are described in our paper entitled “COPE: Traffic Engineering in Dynamic Networks,” SIGCOMM '06, Sep. 11-15, 2006, Pisa, Italy, incorporated herein by reference (see e.g., sections 2 and 7) (hereinafter referred to as “COPE Paper”). Other examples of traffic engineering solutions have been proposed to deal with unexpected changes in traffic demands and/or interdomain routes.
Despite the importance of handling traffic spikes, most of the proposed traffic engineering algorithms belong to a type of algorithms which we call TE optimization based on samples. Such algorithms optimize their routing without preparing for unpredictable traffic spikes. In particular, in this type of algorithms, a set of sample traffic matrices is collected. A routing is then computed to optimize the performance for just these samples. The optimization can be conducted using either the average cost or the worst case over the samples. An advantage of this type of algorithms is their potential performance gain. When the network traffic is relatively stable, and the real traffic is similar to the samples based on which the routing is computed, these algorithms can achieve near-optimal performance. However, since these algorithms optimize routing specifically for these samples, when the real traffic deviates substantially from the samples (e.g., during the presence of traffic spikes), the routing may perform poorly.
An extreme case of TE optimization based on samples is completely-online adaptation, which essentially is a feedback loop using real-time traffic measurements to adjust routing. An advantage of this scheme is that it can converge quickly to optimal without the need to collect multiple samples. However, when there are significant fast traffic changes, routing recomputation delays can cause such methods to suffer a large transient penalty.
We have observed through the use of real traffic traces that TE optimization based just on samples can pay a serious performance penalty when unexpected traffic spikes occur, leading to potential network failure. For example, using the topology and real traffic traces of a major tier-1 ISP, we have found that optimizing routing based on historical traffic demand alone can result in a several-factor increase in traffic intensity compared to optimal routing (based on the actual demand). In such cases, the traffic intensity to some links well exceeds their link capacities. Additionally, for real Abilene Internet2 backbone traces, we observed that for some links, the traffic intensity generated by the algorithms based on predicted traffic demands reaches 2.44 times link capacity, while for an optimal routing, no link receives traffic above 50% of its capacity. Such large performance penalties arise when traffic demands change significantly from previous demands.
Another solution to providing a performance bound component for traffic engineering is provided by the pioneering work of oblivious routing, as described for example in D. Applegate and E. Cohen, “Making Intra-Domain Routing Robust To Changing And Uncertain Traffic Demands: Understanding Fundamental Tradeoffs,” Proceedings of ACM SIGCOMM '03, Karlsruhe, Germany, August 2003 (hereinafter referred to as Applegate and Cohen).
In oblivious routing, a routing is computed that is independent of the historical traffic demand matrix, and thus has the potential to handle traffic spikes well. A potential drawback of completely oblivious routing, however, is its sub-optimal performance for normal traffic, which may account for the vast majority of the time the network operates. For example, the worst-case bound of the oblivious ratio (i.e., the ratio between the maximum link utilization under oblivious routing and that under optimal routing) is on the order of log(n), where n is the network size. Applegate and Cohen computed the oblivious ratio of several realistic network topologies. Although they discovered that the ratio is typically only around 2, they also commented that overhead at this level “is far from being negligible to working ISPs.” The performance tradeoff required by oblivious routing means that in the average case, i.e., the case of expected traffic demands, oblivious routing is 30%-90% worse than optimal, which results in an inefficient and uneconomical use of network resources during the presumed majority of times the network is operating at average traffic levels.
The challenges to routing posed by such intradomain traffic demand fluctuations are compounded when the AS handles interdomain traffic through connections to other ASes. First, although interdomain routes for most traffic volumes can be stable, there are Border Gateway Protocol (BGP) routing changes which can cause significant shifts of traffic. In particular, with the dynamic nature of the global Internet, the available interdomain routes of an AS can fluctuate as its peers announce and withdraw interdomain routes, or even reset their eBGP sessions. Also, the quality of interdomain routes can fluctuate as network conditions fluctuate. If a currently used interdomain route is no longer available, or the quality of an interdomain route violates its Service Level Agreement (SLA), an AS has no choice but to adjust its routing. Second, interdomain routing introduces multiple-point demands; that is, there can be multiple equally-good egress points in the BGP decision process. Thus, it is up to the intradomain routing determined by traffic engineering to break the tie. Since egress links may become the bottlenecks of the network, this tie-breaking can affect the congestion of the network.
A major challenge in traffic engineering thus is how to cope with dynamic and unpredictable changes in intradomain traffic demands (as well as unpredictable changes in the availability and quality of interdomain routes), and simultaneously provide efficient and economical utilization of network resources during expected traffic demand scenarios.
Accordingly, there is a need to provide practically implementable traffic engineering methods and systems and computer program media that provide near optimal use of network topologies for expected traffic scenarios while simultaneously managing unexpected scenarios.