A computer network is a geographically distributed collection of interconnected subnetworks, such as local area networks (LAN) that transport data between network nodes. As used herein, a network node is any device adapted to send and/or receive data in the computer network. Thus, in this context, “node” and “device” may be used inter-changeably. The network topology is defined by an arrangement of network nodes that communicate with one another, typically through one or more intermediate nodes, such as routers and switches. In addition to intra-network communications, data also may be exchanged between neighboring (i.e., adjacent) networks. To that end, “edge devices” located at the logical outer-bound of the computer network may be adapted to send and receive inter-network communications. Both inter-network and intra-network communications are typically effected by exchanging discrete packets of data according to predefined protocols. In this context, a protocol consists of a set of rules defining how network nodes interact with each other.
Each data packet typically comprises “payload” data prepended (“encapsulated”) by at least one network header formatted in accordance with a network communication protocol. The network headers include information that enables network nodes to efficiently route the packet through the computer network. Often, a packet's network headers include a data-link (layer 2) header, an internetwork (layer 3) header and a transport (layer 4) header as defined by the Transmission Control Protocol/ Internet Protocol (TCP/IP) Reference Model. The TCP/IP Reference Model is generally described in more detail in Section 1.4.2 of the reference book entitled Computer Networks, Fourth Edition, by Andrew Tanenbaum, published 2003, which is hereby incorporated by reference as though fully set forth herein.
A data packet may originate at a source node and subsequently “hop” from node to node along a logical data path until it reaches its addressed destination node. The network addresses defining the logical data path of a data flow are most often stored as Internet Protocol (IP) addresses in the packet's internetwork header. IP addresses are typically formatted in accordance with the IP Version 4 (IPv4) protocol, in which network nodes are addressed using 32 bit (four byte) values. Specifically, the IPv4 addresses are denoted by four numbers between 0 and 255, each number usually delineated by a “dot.” A subnetwork may be assigned to an IP address space containing a predetermined range of IPv4 addresses. For example, an exemplary subnetwork may be allocated the address space 128.0.10.*, where the asterisk is a wildcard that can differentiate up to 254 individual nodes in the subnetwork (0 and 255 are reserved values). For instance, a first node in the subnetwork may be assigned to the IP address 128.0.10.1, whereas a second node may be assigned to the IP address 128.0.10.2.
A subnetwork is associated with a subnet mask that may be used to select a set of contiguous high-order bits from IP addresses within the subnetwork's allotted address space. A subnet mask length indicates the number of contiguous high-order bits selected by the subnet mask, and a subnet mask length of N bits is hereinafter represented as /N. The subnet mask length for a given subnetwork is typically selected based on the number of bits required to distinctly address nodes in that subnetwork. Subnet masks and their uses are more generally described in Chapter 9 of the reference book entitled Interconnections Second Edition, by Radia Perlman, published January 2000, which is hereby incorporated by reference as though fully set forth herein.
By way of example, assume an exemplary subnetwork is assigned the IP address space 128.0.10.4, and the subnetwork contains two addressable (reachable) network nodes. In this case, 30 address bits are needed to identify the subnetwork 128.0.10.4, and is the remaining two address bits are required to distinctly address either of the two nodes in the subnetwork. Thus, the subnetwork may be associated with a subnet mask length of /30 since only the first 30 most-significant bits of an IP address are required to uniquely address this subnetwork. As used herein, an “address prefix” is defined as the result of applying a subnet mask to a network address. For example, consider the address prefix 128.0.10.1 /24. In this case, the network portion of the prefix contains the 24 most-significant bits of the IP address 128.0.10.1, i.e., the network is 128.0.10.0, and the last 8 bits are used to identify hosts on that network. An IP address and an address prefix are said to “match” when the prefix's network portion equals the IP address's most-significant bits.
Interior Gateway Protocols
A computer network may contain smaller groups of one or more subnetworks which may be managed as separate routing domains. As used herein, a routing domain is broadly construed as a collection of interconnected network nodes under a common administration. Often, a routing domain is managed by a single administrative entity, such as a company, an academic institution or a branch of government. Such a centrally-managed routing domain is sometimes referred to as an “autonomous system.” In general, a routing domain may operate as an enterprise network, a service provider or any other type of network or subnetwork. Further, the routing domain may contain one or more edge devices having “peer” connections to edge devices in adjacent routing domains.
Network nodes in a routing domain are typically configured to forward data using predetermined paths from “interior gateway” routing protocols, such as conventional link-state protocols and distance-vector protocols. These interior gateway protocols (IGP) define the manner with which routing information and network-topology information is exchanged and processed in the routing domain. For instance, IGP protocols typically provide a mechanism for distributing a set of reachable IP subnetworks among the intermediate nodes in the routing domain. As such, each intermediate node receives a consistent “view” of the domain's topology. Examples of link-state and distance-vectors protocols known in the art, such as the Open Shortest Path First (OSPF) protocol and Routing Information Protocol (RIP), are described in Sections 12.1-12.3 of the reference book entitled Interconnections, Second Edition, by Radia Perlman, published January 2000, which is hereby incorporated by reference as though fully set forth herein.
The Border Gateway Protocol (BGP) is usually employed as an “external gateway” routing protocol for routing data between autonomous systems. The BGP protocol is well known and generally described in Request for Comments (RFC) 1771, entitled A Border Gateway Protocol 4 (BGP-4), by Y. Rekhter et al., published March 1995, which is publicly available through the Internet Engineering Task Force (IETF) and is hereby incorporated by reference in its entirety. A variation of the BGP protocol, known as internal BGP (iBGP), is often used to distribute inter-network reachability information (address prefixes) among BGP-enabled edge devices in a routing domain. To implement iBGP, the edge devices must be “fully meshed,” i.e., such that every device is coupled to every other device by way of a TCP connection. In practice, conventional route reflectors are used to logically couple devices into a full mesh. The BGP protocol also may be extended for compatibility with other services other than standard Internet connectivity. For instance, Multi-Protocol BGP (MP-BGP) supports various address family identifier (AFI) fields that permit BGP messages to transport multi-protocol information, such as is the case with RFC 2547 services.
A network node in a routing domain may detect a change in the domain's topology. For example, the node may become unable to communicate with one of its neighboring nodes, e.g., due to a link failure between the nodes or the neighboring node failing, such as going “off line” for repairs. If the detected node or link failure occurred within the routing domain, the detecting node may advertise the intra-domain topology change to other nodes in the domain using an interior gateway protocol, such as OSPF. Similarly, if an edge device detects a node or link failure that prevents communications with a neighboring routing domain, the edge device may disseminate the inter-domain topology change to its other fully-meshed edge devices, e.g., using the iBGP protocol. In either case, there is an inherent latency of propagating the network-topology change within the routing domain and having nodes in the domain converge on a consistent view of the new network topology, i.e., without the failed node or link.
Multi-Protocol Label Switching/Virtual Private Network Architecture
A virtual private network (VPN) is a collection of network nodes that establish private communications over a shared backbone network. Previously, VPNs were implemented by embedding private leased lines in the shared network. The leased lines (i.e., communication links) were reserved only for network traffic among those network nodes participating in the VPN. Today, the above-described VPN implementation has been mostly replaced by private “virtual circuits” deployed in public networks. Specifically, each virtual circuit defines a logical end-to-end data path between a pair of network nodes participating in the VPN. When the pair of nodes is located in different routing domains, edge devices in a plurality of interconnected routing domains may have to co-operate to establish the nodes'virtual circuit.
A virtual circuit may be established using, for example, conventional Layer-2 Frame Relay (FR) or Asynchronous Transfer Mode (ATM) networks. Alternatively, the virtual circuit may “tunnel” data between its logical end points using known Layer-2 and/or layer-3 tunneling protocols, such as the Layer-2 Tunneling Protocol (L2TP) and the Generic Routing Encapsulation (GRE) protocol. In this case, one or more tunnel headers are prepended to a data packet to appropriately route the packet along the virtual circuit. The Multi-Protocol Label Switching (MPLS) protocol may be used as a tunneling mechanism for establishing Layer-2 virtual circuits or layer-3 network-based VPNs through an IP network.
MPLS enables network nodes to forward packets along predetermined “label switched paths” (LSP). Each LSP defines a logical data path, or virtual circuit, between a pair of source and destination nodes; the set of network nodes situated along the LSP may be determined using reachability information provided by conventional interior gateway protocols, such as OSPF. Unlike traditional IP routing, where node-to-node (“next hop”) forwarding decisions are performed based on destination IP addresses, MPLS-configured nodes instead forward data packets based on “label” values (or “tag” values) added to the IP packets. As such, a MPLS-configured node can perform a label-lookup operation to determine a packet's next-hop destination. MPLS traffic engineering provides additional advantages over IP-based routing, such as enabling MPLS-configured nodes to reserve network resources, such as bandwidth, to ensure a desired quality of service (QoS).
Each destination represented via a LSP is associated with a locally allocated label value at each hop of the LSP, such that the locally allocated label value is carried by data packets forwarded over its associated hop. The MPLS label values are typically distributed among the LSP's nodes using, e.g., the Label Distribution Protocol (LDP), Resource Reservation Protocol (RSVP) or MP-BGP protocol. Operationally, when a data packet is received at a MPLS-configured node, the node extracts the packet's transported label value, e.g., stored at a known location in the packet's encapsulating headers. The extracted label value is used to identify the next network node to forward the packet. Typically, an IGP label determines the packet's next hop within a routing domain, and a VPN label determines the packet's next hop across routing domains. More generally, the IGP label may be a MPLS label or any other encapsulation header used to identify the packet's next hop in the routing domain.
The packet may contain a “stack” of labels such that the stack's top-most label determines the packet's next-hop destination. After receiving the packet, the MPLS-configured node “pops” (removes) the packet's top-most label from the label stack and performs a label-lookup operation to determine the packet's next-hop destination. Then, the node “pushes” (inserts) a new label value associated with the packet's next hop onto the top of the stack and forwards the packet to its next destination. This process is repeated for every logical hop along the LSP until the packet reaches its destination node. The above-described MPLS operation is described in more detail in Chapter 7 of the reference book entitled IP Switching and Routing Essentials, by Stephen Thomas, published 2002, which is hereby incorporated by reference as though fully set forth herein.
Layer-3 network-based VPN services that utilize MPLS technology are often deployed by network service providers for one or more customer sites. These networks are typically said to provide “MPLS/VPN” services. As used herein, a customer site is broadly defined as a routing domain containing at least one customer edge (CE) device coupled to a provider edge (PE) device in the service provider's network (“provider network”). The customer site may be multi-homed to the provider network, i.e., wherein one or more of the customer's CE devices is coupled to a plurality of PE devices. The PE and CE devices are generally intermediate network nodes, such as routers or switches, located at the edge of their respective networks. The PE-CE data links may be established over various physical mediums, such as conventional wire links, optical links, wireless links, etc., and may communicate data formatted using various network communication protocols including ATM, Frame Relay, Ethernet, Fibre Distributed Data Interface (FDDI), etc. In addition, the PE and CE devices may be configured to exchange routing information over their respective PE-CE links in accordance with various interior and exterior gateway protocols, such as BGP, OSPF, RIP, etc.
In the traditional MPLS/VPN network architecture, each customer site may participate in one or more different VPNs. Most often, each customer site is associated with a single VPN, and hereinafter the illustrative embodiments will assume a one-to-one correspondence between customer sites and VPNs. For example, customer sites owned or managed by a common administrative entity, such as a corporate enterprise, may be statically assigned to the enterprise's VPN. As such, network nodes situated in the enterprise's various customer sites participate in the same VPN and are therefore permitted to securely communicate with one another via the provider network. In other words, the provider network establishes the necessary LSPs to interconnect the customer sites participating in the enterprise's VPN. Likewise, the provider network also may establish LSPs that interconnect customer sites participating in other VPNs. This widely-deployed MPLS/VPN architecture is generally described in more detail in Chapters 8-9 of the reference book entitled MPLS and VPNArchitecture, Volume 1, by I. Pepelnjak et al., published 2001 and in the IETF publication RFC 2547, entitled BGP/MPLS VPNs, by E. Rosen et al., published March 1999, each of which is hereby incorporated by reference as though fully set forth herein.
FIG. 1 illustrates an exemplary MPLS/VPN network 100 containing a provider network 110 coupled to neighboring customer sites 120, 130 and 140. The provider network includes a plurality of PE devices 700, including devices PE1 700a, PE2 700b and PE3 700c. The PE devices are fully meshed at the BGP level. That is, each PE device in the provider network can communicate with every other PE device (either directly or by means of BGP route reflectors). The network 110 also contains “core” provider (P) devices 195a-d, such as routers, which are respectively labeled P1, P2, P3 and P4. These P devices may be used to establish label switched paths between pairs of PE devices. For example, the provider devices P1 and P2 may be used to establish a first LSP1 between PE3 and PE1, and the devices P3 and P4 may be used to establish a second LSP2 between PE3 and PE2.
Each neighboring customer site 120-140 contains one or more CE devices attached to PE devices in the provider network 110. For instance, the customer site 120 contains CE devices 160 and 165 (labeled CE1 and CE2) which are respectively coupled to PE1 and PE2. Similarly, the customer site 130 includes a CE device 135 (labeled CE4) attached to PE2 and the customer site 140 includes a CE device 185 (labeled CE3) attached to PE3. The customer sites 120-140 are assigned to respective VPNs. For purposes of illustration, the customer sites 120 and 140 are assigned to the VPN1 and the customer site 130 is assigned to the VPN2. In this arrangement, network nodes in the customer sites 120 and 140 (VPN1) may not establish communications with nodes in the customer site 130 (VPN2) and vice versa since they participate in different VPNs. However, network nodes in the customer site 120 may communicate with nodes in the customer site 140, and vice versa, since the customer sites 120 and 140 both participate in VPN1. Notably, VPN1 and VPN2 may contain overlapping IP address spaces.
As noted, communications may be established through the MPLS/VPN network 100 between remote customer sites participating in the same VPN, e.g., VPN1. The provider network 110 may create a MPLS tunnel, such as LSP1 or LSP2, to provide a logical data path between the remote customer sites of VPN1. Suppose a source node (S) 150 in the customer site 140 addresses a data packet 105 to a destination node (D) 155 in the customer site 120. The source node forwards the packet to its local customer edge device CE3, which in turn transfers the packet across domain boundaries to the provider edge device PE3. PE3 then determines an appropriate LSP over which to forward the packet through the provider network 110 to the customer site 120 containing the packet's addressed destination node 155.
The provider edge device PE3 may associate the received packet 105 with a LSP based on the packet's contained destination IP address. For purposes of discussion, assume the packet 105 is routed from PE3 to PE1 via LSP 1, as shown in bold. The packet is received by the provider edge device PE1 at the tail-end of the LSP1 and the packet is then forwarded over the PE1-CE1 link to CE1 in the customer site 120. CE1 receives the packet and forwards it to the destination node 155.
Problems arise in the conventional MPLSIVPN architecture when a node or link failure prevents data communications over a PE-CE data link. For example, suppose that the PEI-CEL link fails as denoted by a dotted “X.” After identifying the failure, the provider edge device PE1 may advertise, within the provider network 110, that it has lost reachability to the IP addresses previously advertised by CE devices in the customer site 120. Accordingly, PEI may propagate the identified routing change by disseminating iBGP update messages to its fully-meshed PE devices. Eventually, the routing change is distributed throughout the provider network 110 and each PE device updates its local routing information to converge on the new network topology, i.e., without the failed PE1-CE1 link.
The conventional latency required for the PE devices to converge on the new network topology, i.e., without the PE1-CE1 link, is often overly time consuming, e.g., on the order of seconds, and causes a number of significant problems. For instance, data packets are often “dropped” (i.e., discarded) at the edge of the provider network while the network is in the process of converging. For example, in response to the PE1-CE1 link failing, data packets 105 addressed to the destination node 155 will be dropped by PE1 (at the tail-end of LSP1) until the network converges on an alternate data path LSP2 for those packets. For many data flows, such as voice-over-IP (VoIP) and video data flows, this temporary loss of data at PE1 may significantly degrade the utility of the overall data transfer or may cause the data flow to time-out and stop completely.
It is therefore generally desirable for MPLSJVPN networks to achieve faster convergence times, e.g., sub-second convergence times, in response to CE node or link failures over PE-CE links. The MPLS/VPN networks should quickly converge on the new network topology with minimal data loss at the edge of the network.