The invention relates to the field of data transmission protocols and, more particularly, to the field of selecting an efficient point of egress from a data network by ranking possible points of egress according to principles of tunable inter-domain egress (TIE) as further explained herein.
The Internet's two-tiered routing architecture was designed to have a clean separation between intra-domain and inter-domain routing protocols. For example, an inter-domain protocol allows the border routers to learn how to reach external destinations, whereas the intra-domain protocol determines how to direct traffic from one router in an autonomous system (AS) to another router. However, the appropriate roles of the two protocols becomes unclear when the autonomous system learns routes to a destination at multiple border routers—a situation that arises quite often today. An autonomous system as defined by Newton's Telecom Dictionary is a collection of routers under a single administrative authority using a common Interior Gateway Protocol for routing packets. By intra-domain and inter-domain protocol is meant the respective reach of a protocol as either within the autonomous system or domain or reaching beyond the autonomous system to include other autonomous systems or domains. Since service providers peer at multiple locations, essentially all of the traffic from customers to the rest of the Internet has multiple possible egress routers. In addition, many customers connect to their provider in multiple locations for fault tolerance purposes and for more flexible load balancing, resulting in multiple egress routers for these destinations as well. Selecting among multiple possible egress points is now a fundamental part of the Internet routing architecture, independent of the current set of routing protocols.
In the Internet today, for example, per Bressoud et al., “Optimal Configuration for BGP Route Selection,” IEEE, 2003; Rekhter et al., “A Border Gateway Protocol,” September, 2004 and subsequent related publications, border routers learn routes to destination prefixes via a known Border Gateway Protocol (BGP). When multiple border routers have routes that are “equally good” in the BGP sense (e.g., local preference, path length within the autonomous system, etc.), each router in the autonomous system may direct traffic to the closest border router, in terms of Interior Gateway Protocol (IGP) distances. This policy of early-exit or so-called “hot-potato” routing is hard-coded in the BGP decision process implemented on each router.
Hot-potato routing allows a router to implement a simple decision rule, independently of the other routers, while ensuring that packets are forwarded to neighboring routers that have selected the same (closest) egress point. In addition, hot-potato routing tends to limit the consumption of bandwidth resources in the network by shuttling traffic to the next autonomous system as early as possible.
The decision to select egress points based on IGP distances may be inappropriate in light of the growing pressure to provide good, predictable communication performance for applications such as voice-over-IP, on-line gaming, and business transactions. Hot-potato routing may be unnecessarily restrictive. The underlying mechanism of hot-potato routing dictates a particular policy rather than supporting diverse performance objectives important to network administrators. Moreover, “hot potato” routing tends to be disruptive. Small changes in IGP distances can sometimes lead to large shifts in traffic, long convergence delays, and BGP updates to neighboring domains. Network administrators are forced to select IGP metrics that make “BGP sense,” rather than viewing the two parts of the routing system separately.
Selecting an egress point and computing a forwarding path to the egress point are two very distinct functions, and decoupling these functions may be appropriate. Paths inside the network should be selected based on some meaningful performance objective, whereas the egress selection may be flexible to support a broader set of traffic-engineering goals.
The Internet routing system has three main components: (i) inter-domain routing, which determines the set of border (or egress) routers that direct traffic toward a destination, (ii) intra-domain routing, which determines the path from an ingress router to an egress router, and (iii) egress-point selection, which determines which egress router is chosen by each ingress router for each destination. Tying egress selection to IGP distances may lead to harmful disruptions and over-constrained traffic-engineering problems. Also, allowing each ingress router to have a fixed ranking of egress points may not be flexible enough (for traffic engineering) or adaptive enough (to large changes in the network topology).
An exemplary network is shown in FIG. 1 comprising autonomous systems AS 0, 1, 2 and 3 where a source S is transmitting toward a destination p via AS 0. Autonomous system AS 1 101 is shown having five routers (A, B, C, D, and E) by way of example and each internal link has an IGP metric shown. Router C learns BGP routes to destination p from possible egress routers A and B.
Under hot-potato routing, point of ingress router C into AS 1 chooses the BGP route learned from A because the IGP distance to A is 1+1 or 2, which is smaller than the distance of 9 to B. However, if the C-D link fails (indicated by the X break), all traffic from ingress C to destination p would shift to egress router B, with an IGP distance of 9 that is smaller than the IGP distance of 10 to alternative egress router A. These kinds of routing changes are disruptive. Yet, continuing to use egress-point A might not be the right thing to do, either, depending on the propagation delay, traffic demands, and link capacities. Instead, network administrators need a mechanism that is flexible enough to support sound performance trade-offs.
Hot-potato routing has the advantage of adapting automatically to topology changes that affect the relative distances to the egress points. Although hot-potato routing is a reasonable way to minimize resource consumption, IGP link weights do not express resource usage directly. The IGP distances do not necessarily have any relationship to hop count, propagation delay, or link capacity, and selecting the closer egress point does not necessarily improve network performance. In addition, small topology changes can lead to performance disruptions, for example, large shifts in traffic within and between autonomous systems. A single link failure can potentially impact the egress-point selection for tens of thousands of destinations at the same time, leading to large shifts in traffic. In fact, hot-potato routing changes may be responsible for many of the largest traffic variations in a large backbone.
Another type of performance disruption is changes in the downstream path. When the egress point changes, the traffic moves to a different downstream forwarding path that may have a different round-trip time or available bandwidth, which may disrupt the communicating applications. In addition, the abrupt increase in traffic entering the neighboring AS may cause congestion.
Yet another performance disruption is the need for BGP update messages for neighboring domains. A change in egress point may also change the AS path. The failure of the C-D link in FIG. 1 causes router C to switch from a path through AS 2 to one through AS 3, forcing C to send a BGP update message to source autonomous system AS 0. Global BGP convergence may take several minutes. If AS 0 switches to a BGP route announced by another provider, the traffic entering AS 1 at router C would change.
Even if the hot-potato routing change does not lead to new BGP update messages, long convergence delays can occur inside the autonomous system depending on how the router implements the BGP decision process. Long convergence delays may occur because the underlying routers in the network only revisited the influence of IGP distances on BGP decisions once per minute; during the convergence period, data packets may be lost, delayed, or delivered out of order.
In a large network, IGP changes that affect multiple destination prefixes happen several times a day, sometimes leading to very large shifts in traffic. Not all of these events are caused by unexpected equipment failures—a large fraction of them are caused by planned events, such as routine maintenance performed by service personnel. Maintenance activities may happen quite frequently, for example, to upgrade operating systems on routers, replace line cards or repair optical amplifiers, or construction activities may require moving fibers or disabling certain links temporarily. A recent study of the Sprint backbone showed that almost half of IGP events happened during maintenance windows.
Often, shifts in egress points are not necessary. The new intra-domain path to the old egress point, although a little longer IGP-wise, may offer comparable (or even better) performance than the path to the new egress point. Following the failure of the C-D link in FIG. 1, the path C,E,D,A might be less congested or have lower propagation delay than the path C, E, B. Moreover, many internal network changes are short-lived; a study of the Sprint backbone showed that 96% of failures were repaired in less than 15 minutes. Maintenance activities are often done in periods of lower traffic demands, when the network would comfortably have extra capacity to tolerate the temporary use of non-closest egress points.
Besides being disruptive, the tight coupling between egress selection and IGP metrics makes traffic engineering and maintenance planning extremely difficult. Network administrators indirectly control the flow of traffic by tuning the IGP metrics and BGP policies. However, finding good settings that result in the desired behavior is computationally challenging, due to the large search space and the need to model the effects on egress-point selection. Finding settings that are robust to a range of possible equipment failures is even more difficult, imposing even more constraints, such as minimizing hot-potato disruptions across all routers and destination prefixes and making the problem increasingly untenable. In addition, once local-search techniques identify a better setting of the IGP metrics or BGP policies, changing these parameters in the routers requires the network to go through routing-protocol convergence, leading to transient performance disruptions.
Another alternative is to configure each router with a fixed ranking of the egress points, where the router would select the highest-ranked element in the set of egress routers for each destination. This solution may be realizable using today's technology. According to a principle of the present invention, a fixed ranking method would include the step of establishing a tunnel from each ingress router to each egress router, and assigning an IGP metric to the tunnel. By a “tunnel” is intended the establishment of a packet communication between one router and another router without the packet communication's passing through intermediate routers. The data packets would follow the shortest underlying IGP path from the ingress router to the chosen egress router. The hot-potato mechanism may still be used to dictate the selection of egress points, but the metric associated with each tunnel would be defined statically at configuration time rather than be automatically computed by the IGP. Thus, network administrators may rank the egress points from each router's perspective, allowing each ingress router to select the highest-ranked egress point independent of internal network events, short of the extreme case where the egress point becomes unreachable and the router is forced to switch to the egress point with the next highest rank.
For the example in FIG. 1, router C could be preconfigured to prefer egress A over B. Then, when the C-D link fails, C would continue to direct traffic toward router A, though now using the path C,E,D,A. This would avoid triggering the traffic shift to B, changes in the downstream forwarding path, and BGP updates to neighboring domains. However, although the fixed ranking is extremely robust to internal changes, sometimes switching to a different egress point is a good idea. For example, the path C,E,D,A may have limited bandwidth or a long propagation delay, making it more attractive to switch to egress-point B, even at the expense of causing a transient disruption. In the long term, network administrators could conceivably change the configuration of the ranking to force the traffic to move to a new egress point, but the reaction would not be immediate. Similarly, the administrators could reconfigure the IGP metrics or BGP policies to redistribute the traffic load, at the expense of searching for a suitable solution, reconfiguring the routers, and waiting for the routing protocol to converge.
Hot potato and fixed ranking mechanisms for selecting egress points represent two extremes in trade-off between robustness and automatic adaptation. Hot-potato routing adapts immediately to internal routing changes (however small), leading to frequent disruptions. Imposing a fixed ranking of egress points, while robust to topology changes, cannot adapt in real time to critical events. Neither mechanism offers sufficient control for network administrators trying to engineer the flow of traffic and plan for maintenance.