1. Field of the Invention
The present invention relates to routing in telecommunication systems, and, more particularly, to determining paths through nodes of a network for efficient and robust routing following node failure.
2. Description of the Related Art
In packet-based communications networks, such as the Internet, each stream of data packets, called a packet flow, is transferred through the network over a network path from a source to a destination. Each network path is defined by a set of nodes, interconnected by a set of links. A node may include one or more routers, which are devices in the network that handle data transfer between computers.
A communications system may be structured such that different-sized networks are interconnected and may alternatively or additionally include one or more peer structures in which equivalent-sized networks are interconnected. A packet network may connect to another packet network through nodes referred to as the ingress and egress points. The terms ingress point and egress point may refer to a node of a packet network that connects to another packet network, or alternatively, these terms may refer to the connecting nodes of the other packet network. Packet networks with high capacity that transfer packets between two or more other packet networks are commonly referred to as “backbone” networks.
FIG. 1 shows a backbone network 100 of the prior art having nodes n1-n9 interconnected through links 101, which enable communication between packet networks 102-104. One of the ingress points of backbone network 100 is node n1, which receives packets from a source (i.e., packet network 102), and one of the backbone network's egress points is node n4, which transmits packets to a destination (i.e., packet network 104). Backbone network 100 may support an interior routing protocol to distribute network topology information and route packets between ingress and egress points based on best-effort routing (e.g., destination-based shortest-path routing) through nodes n1-n9. A centralized network management system 105 may be employed to (i) provision virtual circuits, or packet flows, through backbone network 100; (ii) monitor capacity and utilization of links 101; and (iii) coordinate calculation and installation of provisioned paths. Forwarding tables are used by each node to forward each received packet to the next node toward its destination. In addition, centralized network management system 105 may also be employed to collect and distribute network topology information.
An interior routing protocol is employed to determine forwarding of packets between a source and destination pair along a path through the nodes of the backbone network. Packets received by a node are forwarded to other nodes based on a forwarding table constructed in accordance with the interior routing protocol or routes installed with explicit route provisioning. An interior routing protocol may also specify the exchange of network topology and link-state information (“network topology information”) among nodes to allow a node to construct the corresponding forwarding table. In addition, some routing protocols associate a link “cost” with each link between nodes. This link cost may be associated with, for example, average link utilization or revenue generated by the link, as well as link importance in the network. When link-state information or link-bandwidth (e.g., connectivity or available bandwidth) is exchanged between routers, each node in the network has a complete description of the network's topology. An example of a widely used, interior routing protocol for “best-effort” routing is the Open Shortest Path First (OSPF) protocol.
Routing protocols, in addition to providing connectivity, may also enable traffic management. The Multi-Protocol Label Switched (MPLS) standard, for example, allows such routing protocols in backbone networks. The MPLS standard may be employed for networks having virtual circuits (packet flows) with provisioned service levels (also known as guaranteed quality-of-service (QoS)).
A provisioned service level may be, for example, a guaranteed minimum bandwidth for the path of a packet flow through the backbone network. This path having a guaranteed level of service between ingress and egress points may be referred to as a Network Tunnel Path (NTP). As would be apparent to one skilled in the art, specific implementations of NTPs exist for different types of networks. As examples of NTPs, virtual circuits may be established for packet flows in TCP/IP networks, virtual circuits may be established for cells in Asynchronous Transfer Mode (ATM) networks, and label-switched paths (LSPs) may be established for packets in MPLS networks. Packets of a signaling protocol, such as RSVP (Reservation Protocol for IP and MPLS networks) or LDP (Label Distribution Protocol for MPLS networks), may be used to reserve link bandwidth and establish an NTP, once routing for the NTP is calculated. An NTP may be provisioned as an explicit route along a specific path between nodes of the backbone network, i.e., when an NTP is provisioned for a packet flow, all intermediate nodes between the ingress and egress points of the NTP may be specified through which each packet of the flow passes.
In MPLS networks, packets are encapsulated by appending to the packet, or forming from the packet, additional information when the packet is received at an ingress point. The additional information, called a label, is used by routers of the backbone network to forward the packets. FIG. 2 shows such an encapsulated packet 200 having a label 201 appended to packet 202. The label summarizes information in the packet header. The summary may be based on the header field and include an origination (source) address field (o) 210 identifying the address of the ingress point and a termination (destination) address field (t) 211 identifying the address of the egress point(s). In some cases, the label may simply be a pointer that identifies or is otherwise related to specific origination and termination address fields in the header of the received packet. The label also includes one or more service-level fields (bd) 212. Service-level field 212 may identify a desired service level for the virtual circuit (called a “demand”), such as minimum bandwidth required. In some networks, the service-level field is implied from the label itself. Other fields 213 may be included in label 201, such as MPLS standard version, interior routing protocol version, maximum delay, or other types of service-level parameters. Label 201 may alternatively be inserted into packet header (PH) 214 of packet 202, so the order of fields shown in FIG. 2 is exemplary only. Backbone networks may employ labels to group encapsulated packets having similar LSPs into classes (equivalence classes), and methods for forwarding equivalence classes may be employed to simplify calculation of routing for LSPs.
To generate a forwarding table, a set of preferred paths through the network nodes is computed, and weights may be used to calculate the set of preferred paths. Each preferred path has a minimum total weight between nodes (the total weight of a path being the summation of the weights of all links in the path), which is employed in a technique known in the art as shortest-path routing. The resulting set of preferred paths may be defined with a shortest-path tree (SPT). The forwarding table with routing information (e.g., source-destination pair, source ports, and destination ports) is generated from the SPT. The routing information is then used to forward a received packet to its destination along the shortest path of the SPT. The SPT may be calculated using an algorithm such as Dijkstra's algorithm, described in E. Dijkstra, “A Note: Two Problems In Connection With Graphs,” Numerical Mathematics, vol.1, 1959, pp. 269-271, the teachings of which are incorporated herein by reference.
A common shortest-path routing algorithm employed by routers to generate routing of an LSP is the min-hop algorithm. In the min-hop algorithm, each router calculates a path through the backbone network for the stream of packets (packet flow) between the ingress and egress points. Each router constructs a path for routing the packet flow from the ingress point to the egress point with the least number (“min”) of feasible links (“hops”) (a feasible link is a link that has sufficient capacity to route the packet flow). Routing schemes of the prior art, such as shortest-path routing, forward packets based only on destination addresses and use only static and traffic-characteristic-independent link weights to calculate paths for routing tables. Some links on the shortest path between certain pairs of ingress and egress points may be congested, while other links on alternative paths are under-utilized.
A signaling mechanism, such as RSVP or LDP, may be employed to both reserve and establish a connection through the network for a packet flow. The signaling mechanism may specify quality-of-service attributes for the LSP traversing the backbone network. Link congestion caused by shortest-path routing of multiple LSPs may cause rejection of reservation requests by signaling mechanisms, even though sufficient levels of service (quality-of-service guarantees) for the LSP may exist in alternative, under-utilized paths that are only slightly longer. Available network resources are not always efficiently utilized when shortest-path routing is employed.
The Border Gateway Protocol (BGP) is an interautonomous system routing protocol. An autonomous system is a network or group of networks under a common administration and with common routing policies. An interautonomous system routing protocol is used to route data between autonomous systems. BGP is used to exchange routing information for the Internet and is the protocol used between Internet service providers (ISPs). Customer networks, such as universities and corporations, usually employ an Interior Gateway Protocol (IGP), such as Routing Information Protocol (RIP) or Open Shortest Path First (OSPF), for the exchange of routing information within their networks. Customers connect to ISPs, and ISPs use BGP to exchange customer and ISP routes. BGP can be used between autonomous systems, or a service provider can use BGP to exchange routes within an autonomous system.
A major problem in networks is BGP-induced traffic variation. Extreme network traffic fluctuations can happen for a variety of reasons. For example, in the case of a large Internet service provider exchanging traffic with several other providers, the traffic exchange between carriers is typically specified by total traffic volumes over long time periods and possibly a peak rate limit (usually just determined by physical link capacities). The actual distribution of traffic entering at an ingress point to the various network egress points might not be known a priori and can change over time. This is because the distribution is determined by many factors, such as intrinsic changes in traffic to different destination prefixes, and by routing changes either made locally by the carrier or due to changes made in other autonomous systems over which the carrier has no control. Intrinsic changes in traffic distribution can be caused by many factors, such as the sudden appearance of flash crowds responding to special events. An example of local routing changes that can affect the traffic distribution is IGP weight changes combined with “hot-potato” routing, which can change the network egress point that traffic destined to a set of prefixes would otherwise choose. “Hot-potato” routing is a form of routing in which the nodes of a network have no buffers to store packets in before they are moved on to their final predetermined destination, such that each packet that is routed is constantly transferred until it reaches its final destination. Thus, the packet is bounced around like a “hot potato,” sometimes moving further away from its destination because it has to keep moving through the network. Another example is the change in BGP when a Multi-Exit Discriminator (MED) is employed. An MED, also referred to as the “external metric” of a route, is a suggestion to external neighbors about the preferred path into an autonomous system that has multiple entry points. While local routing changes are under a carrier's control and hence change traffic patterns only at planned instances, unpredictable traffic shifts can happen when routing changes in other autonomous systems affect downstream autonomous systems. Due to widespread use of hot-potato routing, IGP weight changes (which can be due to new links being added, maintenance, traffic engineering, etc.) in an autonomous system can cause significant shifts in traffic patterns. Changes in IGP costs can affect the BGP route for a significant percentage of the prefixes, and the affected prefixes can account for a significant percentage of the traffic. Thus, significant shifts in traffic may happen at a carrier due to changes elsewhere in the network.
Another reason that high traffic variability should be considered is that users or carriers entering peering agreements might not be able to characterize their traffic to various sites well. It is much easier to estimate only the total aggregate bandwidth that is either received or sent. Hence, it is preferable to avoid having to rely on knowing the exact traffic matrix and instead use only a partial specification of the traffic matrix. Also, even when the traffic matrix is known, it is often difficult to detect changes in the traffic distribution.
Network congestion typically occurs either due to loss of capacity (upon router or link failures) or due to increased capacity demand (caused by large increases in traffic). In response to these uncontrollable events, carriers should and repeatedly adapt their intra-domain routing to avoid network congestion or have sufficient capacity set aside a priori to accommodate the different traffic and failure patterns that can occur without resorting to routing changes. It is preferable to avoid frequent intra-domain routing changes due to operational complexity and costs, and due to the risk of network instability if changes are not implemented correctly. Moreover, as discussed above, changes in one autonomous system may cause cascading traffic changes in other autonomous systems, thereby affecting the overall stability of many Internet paths. The trade-off in avoiding large routing changes is the significant capacity overprovisioning that must be done to accommodate failures or changing traffic patterns. Ideally, providers would prefer to use an almost-fixed routing scheme that (i) does not require traffic-dependant dynamic adaptation of configuration parameters, (ii) minimizes dynamic capacity re-allocation after failures, and (iii) is minimal in its overprovisioning needs.
Another application where the traffic matrix is unknown a priori is the provision of network-based virtual private network (VPN) services to enterprise customers. Here, a service-level agreement with each customer specifies the amount of traffic that can be sent or received by each site belonging to a VPN. In this scenario, users do not know their traffic matrices and specify to the carrier only the total traffic volume and the peak rate. It is the carrier's task to transport all of the offered VPN traffic to the network and carry that traffic without introducing too much delay. The actual traffic distribution from each site to the other sites is typically unknown and could vary by time-of-day. The carrier network is tasked to carry all of the offered VPN traffic without experiencing network congestion upon traffic-pattern changes or upon node or link failures.
Networks for grid computing provide a further scenario in which traffic variations can be extreme, and the traffic matrix is not known a priori. In grid computing, a complex computational task is partitioned amongst different computing nodes that can be geographically distributed and are connected by a network. The communication patterns amongst grid-computing nodes are highly unpredictable and also can experience high burst rates. Since the traffic matrix is not known a priori, one option is to dynamically reserve capacity over an underlying network, but this approach will be too slow for many grid-computing applications. Because of the high variability in destinations and the bursty nature of the traffic, overprovisioning the network leads to very poor capacity usage most of the time.
To provide good service when traffic patterns can change uncontrollably, carriers should either quickly and repeatedly adapt their intra-domain routing to avoid network congestion or have sufficient capacity set aside a priori to accommodate the different traffic patterns that can occur without resorting to routing changes. Service providers prefer to avoid frequent intra-domain routing changes due to (i) operational complexity and costs and (ii) the risk of network instability if link metric changes are not implemented correctly. Moreover, changes in one autonomous system in a BGP application may cause cascading traffic changes in other autonomous systems, thereby affecting the overall stability of many Internet paths. The trade-off in avoiding routing changes is the significant capacity overprovisioning that can be done to accommodate changing traffic patterns while keeping the routing fixed. Ideally, providers would like to use a fixed routing scheme that does not require traffic-dependent dynamic adaptation of configuration parameters and is parsimonious in its capacity needs.
Moreover, in IP-over-Optical Transport Networks (OTN), routers are connected through a reconfigurable switched optical backbone, or OTN, consisting of optical cross-connects (OXCs) that are typically less expensive than IP router ports. The OXCs are interconnected in a mesh topology using wave-division multiplexing (WDM) links. The core optical backbone consisting of such OXCs takes over the functions of switching, grooming, and restoration at the optical layer. Since the IP traffic flow is carried on an optical-layer circuit (called a “lightpath”), the bypass of router ports for transit traffic creates a basis for huge economies of scale to be reaped by interconnecting IP routers over an optical backbone in IP-over-OTN. By moving transit traffic from the routers to the optical switches, the requirement to upgrade router Point -of-Presence (PoP) configurations with increasing traffic is minimized, since optical switches are more scalable due to their typically increased port count over that of routers. In an IP-over-OTN architecture, a router line card is typically more expensive than an optical switch card, and thus, network cost is typically reduced by keeping traffic mostly in the optical layer. Also, since optical switches are typically much more reliable than routers, their architecture is typically more robust and reliable. Because routers are interconnected over a switched optical backbone, the routing process compromises between keeping traffic at the optical layer and using intermediate routers for packet grooming in order to achieve efficient statistical multiplexing of data traffic.
Dynamic provisioning of bandwidth-guaranteed paths with fast restoration capability is a desirable network service feature for many networks, such as Multi-Protocol Label Switched (MPLS) networks and optical mesh networks. In optical networks, fast restoration is also desirable, since optical transport networks carry a variety of traffic types, each with different, stringent reliability requirements. Similar fast restoration capabilities may be used in MPLS networks in order to provide the needed reliability for services such as packetized voice, critical virtual private network (VPN) traffic, or other quality-of-service (QoS) guarantees.
A connection in a network might be protected at the path level or at the link level. For link restoration (also referred to as local restoration or as fast restoration), each link of the connection is protected by a set of one or more pre-provisioned detour paths that exclude the link being protected. Upon failure of the link, traffic on the failed link is switched to the detour paths. Thus, link restoration provides a local mechanism to route around a link failure. In path restoration, the primary, or working, path of the connection is protected by a “diverse” backup path from source to destination. Upon failure of any of the resources on the working path, traffic is switched to the backup path by the source node. Link restoration might typically restore service much faster than path restoration because restoration is locally activated and, unlike path restoration, failure information need not propagate back through the network to the source.
Service restoration is an important requirement of optical networks. If a network element fails, such as a node (optical switch) or link (optical fiber), the failure causes one or more particular wavelength paths to fail, and affected traffic flow(s) must be restored using an alternative path within a very short interval (e.g., 50 ms). To accomplish relatively rapid restoration times, provisioning identifies, for each wavelength path, two paths through the network: a primary (active) path and a secondary (backup) path. The backup path is link disjoint (active and backup paths do not share links) or node disjoint (active and backup paths do not share either nodes or links) with the primary path. The capacity of links in the backup path may be exclusively assigned to a corresponding primary path (e.g., wavelength), or, for network bandwidth usage efficiency, the capacity may be shared between links of backup paths for different primary paths, depending on the type of restoration desired. Optical network capacity design typically accounts for restoration needs to route disjoint secondary paths with possible sharing.
A problem that frequently arises in networks where the traffic matrix is unknown a priori is trying to achieve the fast restoration of network services after a router or link failure. In this scenario, service providers desire for their networks to be self-managed and self-healing by being able to (i) automatically restore equivalent service to all the traffic that is affected by a router or link failure, (ii) achieve fast restoration by pre-provisioning of capacity so as to minimize dynamic capacity reallocation after failure, (iii) achieve bandwidth efficiencies to avoid excessive overprovisioning, (iv) achieve operational simplicity by use of simple, almost static but fault-tolerant routing schemes, (v) accommodate highly varying traffic without requiring frequent changes to network configuration, (vi) handle any traffic pattern permissible within the constraints imposed by the network's edge-link capacities, (vii) avoid network congestion under high or unpredictable traffic variability without requiring dynamic routing-policy adjustments, and (viii) have capacity requirements close to those needed to accommodate a single traffic matrix while being able to handle all possible traffic matrices subject to ingress-egress capacity constraints.