1. Field of the Invention
The present disclosure relates generally to packet network devices such as switches and routers, and more particularly to methods for the optimal and dynamic, global distribution of traffic ingressing to a network system over multiple paths.
2. Description of Related Art
A network system operating according to the Internet Protocol (IP) is typically comprised of some number of network systems (NS), such as the NS 100 shown in FIG. 1. The term network system and autonomous system are interchangeable in this context. Up until recently, an AS was considered to be a set of routers under the administration of a single entity, using an interior gateway protocol and using common metrics to route packets within the AS. More recently, it has become common for a single AS to employ two or more interior gateway protocols (IGP) and several sets of metrics. From one perspective, an AS can be considered to be a connected group of one or more IP prefixes, run by one or more network operators, which has a single, clearly defined routing policy.
The NS 100 of FIG. 1 includes a number of edge routers (ER1-ERn) connected to a core network. The core network is comprised of a plurality of core routers (CR), CR1 to CRn, that operate to forward traffic received from one of the edge routers (ER1-ERn) to another core router or to another one of the edge routers (ER1-ERn). All of the ERs are connected to at least one core router by one or more physical or logical links. Each of the ERs is capable of receiving traffic from outside the NS 100 and sending this traffic to the core network where it is forwarded to an ER for transmission outside the NS. Based on the topology of NS 100, multiple paths through the NS can be calculated for traffic ingressing on any of the ERs.
In FIG. 1, a flow of traffic labeled Ti/o, ingresses to or egress from ER1, and this traffic Ti/o, can be distributed by the routers comprising NS 100 in proportions D1, D2 and Dn to each of a plurality of the ERs, ER2, ER3 and ERn respectively. Each portion D1, D2 and Dn represents a certain amount of traffic that is typically measured in bits of information per second, for instance, and each portion can be the same or different amounts of traffic. As shown in FIG. 1, the portion D1 can be distributed over a path P1, portion D2 can be distributed over a path P2 and portion Dn can be distributed over a path Pn through the NS 100. Each of the paths, P1-Pn, can be comprised of a sequence of multiple routers connected by the physical or logical links, and each of the links are capable of supporting a particular amount of traffic. While the links connecting the routers in NS 100 are shown as single links, each of the links can be either single physical links or an aggregation of two or more logical links. Each of the links can support a particular volume or amount of network traffic, which is referred to as link bandwidth. The capability of a network link to support a particular volume of network traffic is determined by the capacity of physical interfaces connected to a link to process the volume of traffic. Physical interfaces included on a router can be designed to process traffic ingressing to them at various rates, which currently can approach 40 Gbits/second. The amount of traffic that a link can support is typically referred to the link bandwidth, and the unused or available link bandwidth at any point in time is referred to as instantaneous available link bandwidth or simply available link bandwidth. Path bandwidth is the minimum of the link bandwidths or available link bandwidths of all of the links comprising a path through the network system. So for example, network traffic Ti/o can be forwarded along the path P1 which includes ER1 (ingress router), core router CR0 and ER2 (egress router), and the available bandwidth over path P2 is the minimum link bandwidth along the path P1. In this case, path P1 includes a link, L1, that connects ER1 to CR0 and a link, L2, that connects CR0 to ER2. If the bandwidth of link L1 is 10 Gbits/second and the bandwidth of link L2 is 5 Gbits/second, then the path P1 bandwidth is lesser of the two link bandwidths, or 5 Gbits/second.
In order to forward the traffic Ti/o, over path P1 in the NS 100 without the loss of any information, it is necessary for the available bandwidth of path P1 to be greater than or equal to the volume or amount of traffic in Ti/o. Assuming that the available path P1 bandwidth is equal to or greater than the volume of traffic in Ti/o, if the NS 100 is stable along path P1 (i.e., the link states comprising the path are not changing), the traffic Ti/o, can be forwarded over path P1 without the loss of any information. However, in the event that one or more internal ports associated with a link comprising path P1 flaps (fails), the available path P1 bandwidth may be lowered, resulting in the loss of some of the traffic Ti/o until the routers comprising NS 100 can recalculate a new path and program their forwarding tables to redirect some or all of the traffic Ti/o. Prior art traffic redistribution methods are limited in as much as the network protocol running on each router in the system only considers the traffic Ti/o ingressing to it when recalculating a route through the network system.
Interior Gateway Protocols (IGP) running on routers or switches in a network system operating according to the Internet Protocol (IP) generally operate to collect certain information from neighboring routers and switches that can be used to calculate paths through the network that are used to forward network traffic. As described earlier with reference to FIG. 1, a path can be comprised of a sequence of multiple routers connected by physical or logical links, and each of the links are capable of supporting a particular amount of traffic. Depending upon the complexity of the network system, there can be multiple paths between two different network edge devices, such as the ERs of FIG. 1. Typically, an IGP, such as the well known OSPF (Open Shortest Path First) protocol, uses a cost metric associated with each router interface (physical or logical) to calculate one or more shortest paths from the router to a destination. The cost metric can be assigned to each interface by a system administrator and this cost metric can dependent on the distance from one router to another (round-trip time), link bandwidth, link availability (delay), and/or link reliability factors to name only three criteria that can be considered when assigning cost to a router interface. The OSPF protocol running on a router uses the costs assigned to each of its interfaces to calculate the shortest paths from it to a destination address, for instance. Specifically, the Dijkstra algorithm is typically used to calculate the least cost paths through a network system, such as the network system 100 in FIG. 1. The result of applying the Dijkstra algorithm to link state information maintained by each router is a series of connected routers that represent the least cost paths to each router and the cost of each path.
Referring again to FIG. 1, if the result of the calculation to identify the least cost paths from ER1 to ER3 in the NS 100 result in a path P2 cost equal to three and a path P3 cost equal to three, then OSPF running on ER1 will typically select either path P2 or path P3 (assuming P2 and P3 have enough available bandwidth to support the traffic) as the paths for traffic Ti/o through the NS 100. Paths P2 and P3 are in this case considered to be equal cost paths, and the routing technique most commonly employed to select which of two or more equal-cost paths to forward a flow of traffic is the well known Equal Cost Multi-Path (ECMP) routing technique. ECMP is a routing technique that is explicitly supported by the OSPF protocol. A number of different methods can be used to determine which of several equal cost paths or next hops are selected. Hash-threshold is one method for determining which of several equal cost next hops to select and the round-robin method is another. Each method has their advantages and disadvantages and the reasons for selecting one of the other method is not discussed here. ECMP routing techniques typically divide the traffic with a common destination equally among the multiple equal cost paths, regardless of the bandwidth that is available on any one of the equal cost paths and regardless of the technique employed to select the traffic transmission path.
Continuing to refer to FIG. 1, assuming that the traffici/o is being forwarded over two equal cost paths, paths P2 and P3 for instance, and that the available bandwidth on path P2 is 1 Gbit/second and that the available bandwidth on path P3 is 2 Gbits/second, if ECMP routing distributes traffic Ti/o equally between paths P2 and P3, and if a port associated with the link L5 comprising path P2 flaps (assuming L5 is a logical link comprised of multiple physical links), then depending upon whether path P2 is oversubscribed or not, some traffic may be dropped from that portion of the traffic Ti/o flowing over path P2.