1. Field of the Invention
The present invention relates to computer networks and more particularly to Internet Protocol (IP) traffic routing between customer edge devices (CEs) across a provider network in a computer network.
2. Background Information
A computer network is a geographically distributed collection of interconnected subnetworks, such as local area networks (LAN) that transport data between network nodes. As used herein, a network node is any device adapted to send and/or receive data in the computer network. Thus, in this context, “node” and “device” may be used interchangeably. The network topology is defined by an arrangement of network nodes that communicate with one another, typically through one or more intermediate nodes, such as routers and switches. In addition to intra-network communications, data also may be exchanged between neighboring (i.e., adjacent) networks. To that end, “edge devices” located at the logical outer-bound of the computer network may be adapted to send and receive inter-network communications. Both inter-network and intra-network communications are typically effected by exchanging discrete packets of data according to predefined protocols. In this context, a protocol consists of a set of rules defining how network nodes interact with each other.
Each data packet typically comprises “payload” data prepended (“encapsulated”) by at least one network header formatted in accordance with a network communication protocol. The network headers include information that enables network nodes to efficiently route the packet through the computer network. Often, a packet's network headers include a data-link (layer 2) header, an internetwork (layer 3) header and a transport (layer 4) header as defined by the Transmission Control Protocol/Internet Protocol (TCP/IP) Reference Model. The TCP/IP Reference Model is generally described in more detail in Section 1.4.2 of the reference book entitled Computer Networks, Fourth Edition, by Andrew Tanenbaum, published 2003, which is hereby incorporated by reference as though fully set forth herein. A data packet may originate at a source node and subsequently “hop” from node to node along a logical data path until it reaches its addressed destination node. The network addresses defining the logical data path of a data flow are most often stored as Internet Protocol (IP) addresses in the packet's internetwork header.
A computer network may contain smaller groups of one or more subnetworks which may be managed as separate routing domains. As used herein, a routing domain is broadly construed as a collection of interconnected network nodes under a common administration. Often, a routing domain is managed by a single administrative entity, such as a company, an academic institution or a branch of government. Such a centrally-managed routing domain is sometimes referred to as an “autonomous system” or AS. In general, a routing domain may operate as an enterprise network, a service provider or any other type of network or subnetwork. Further, the routing domain may contain one or more edge devices having “peer” connections to edge devices in adjacent routing domains.
Network nodes within a routing domain are typically configured to forward data using predetermined paths from “interior gateway” routing protocols, such as conventional link-state protocols and distance-vector protocols. These interior gateway protocols (IGPs) define the manner with which routing information and network-topology information are exchanged and processed in the routing domain. The routing information exchanged (e.g., by IGP messages) typically includes destination address prefixes, i.e., the portions of destination addresses used by the routing protocol to render routing (“next hop”) decisions. Examples of such destination addresses include IP version 4 (IPv4) and version 6 (IPv6) addresses. As such, each intermediate node receives a consistent “view” of the domain's topology. Examples of link-state and distance-vectors protocols known in the art, such as the Open Shortest Path First (OSPF) protocol and Routing Information Protocol (RIP), are described in Sections 12.1-12.3 of the reference book entitled Interconnections, Second Edition, by Radia Perlman, published January 2000, which is hereby incorporated by reference as though fully set forth herein.
In practice, each IGP node typically generates and disseminates an IGP message (“advertisement”) whose routing information includes a list of the intermediate node's neighboring network nodes and one or more “cost” values associated with each neighbor. As used herein, a cost value associated with a neighboring node is an arbitrary metric used to determine the relative ease/burden of communicating with that node. For instance, the cost value may be measured in terms of the number of hops required to reach the neighboring node, the average time for a packet to reach the neighboring node, the amount of network traffic or available bandwidth over a communication link coupled to the neighboring node, etc.
IGP messages are usually flooded until each intermediate network IGP node has received an IGP message from each of the other interconnected intermediate nodes. Then, each of the IGP nodes (e.g., in a link-state protocol) can construct the same “view” of the network topology by aggregating the received lists of neighboring nodes and cost values. To that end, each IGP node may input this received routing information to a “shortest path first” (SPF) calculation that determines the lowest-cost network paths that couple the intermediate node with each of the other network nodes. For example, the Dijkstra algorithm is a conventional technique for performing such a SPF calculation, as described in more detail in Section 12.2.4 of the above-referenced book entitled Interconnections Second Edition, by Radia Perlman. Each IGP node updates the routing information stored in its local routing table based on the results of its SPF calculation. More specifically, a routing information base (RIB) updates the routing table to correlate destination nodes with next-hop interfaces associated with the lowest-cost paths to reach those nodes, as determined by the SPF calculation.
The Border Gateway Protocol (BGP) is usually employed as an “external gateway” routing protocol for routing data between autonomous systems. BGP is well known and generally described in Request for Comments (RFC) 1771, entitled A Border Gateway Protocol 4 (BGP-4), by Y. Rekhter et al., published March 1995, which is publicly available through the Internet Engineering Task Force (IETF) and is hereby incorporated by reference in its entirety. External (or exterior) BGP (eBGP) is often used to exchange routing information across routing domain boundaries. Internal BGP (iBGP) is a variation of the eBGP protocol and is often used to distribute inter-network reachability information (address prefixes) among BGP-enabled edge devices situated within the same routing domain. BGP generally operates over a reliable transport protocol, such as TCP, to establish a TCP connection/BGP session. BGP also may be extended for compatibility with services other than standard Internet connectivity. For instance, Multi-Protocol BGP (MP-BGP) supports various address family identifier (AFI) fields that permit BGP messages to transport multi-protocol information, such as is the case with RFC 2547 services, discussed below.
A network node within a routing domain may detect a change in the domain's topology. For example, the node may become unable to communicate with one of its neighboring nodes, e.g., due to a link failure between the nodes or the neighboring node failing, such as going “off line,” etc. If the detected node or link failure occurred within the routing domain, the detecting node may advertise the intra-domain topology change to other nodes in the domain using IGP messages. Similarly, if an edge device detects a node or link failure that prevents communications with a neighboring routing domain, the edge device may disseminate the inter-domain topology change to other edge devices within its routing domain (e.g., using the iBGP protocol). In either case, propagation of the network-topology change occurs within the routing domain and nodes in the domain thus converge on a consistent view of the new network topology, i.e., without the failed node or link.
A virtual private network (VPN) is a collection of network nodes that establish private communications over a shared backbone network. Previously, VPNs were implemented by embedding private leased lines in the shared network. The leased lines (i.e., communication links) were reserved only for network traffic among those network nodes participating in the VPN. Today, the above-described VPN implementation has been mostly replaced by private “virtual circuits” deployed in public networks. Specifically, each virtual circuit defines a logical end-to-end data path between a pair of network nodes participating in the VPN. When the pair of nodes is located in different routing domains, edge devices in a plurality of interconnected routing domains may have to cooperate to establish the nodes' virtual circuit.
A virtual circuit may be established using, for example, conventional layer-2 Frame Relay (FR) or Asynchronous Transfer Mode (ATM) networks. Alternatively, the virtual circuit may “tunnel” data between its logical end points using known layer-2 and/or layer-3 tunneling protocols, such as the Layer-2 Tunneling Protocol (L2TP) and the Generic Routing Encapsulation (GRE) protocol. In this case, one or more tunnel headers are prepended to a data packet to appropriately route the packet along the virtual circuit. The Multi-Protocol Label Switching (MPLS) protocol may be used as a tunneling mechanism for establishing layer-2 virtual circuits or layer-3 network-based VPNs through an IP network.
Generally, label switching techniques may be used to build end-to-end tunnels through an IP/MPLS network of label switched routers (LSRs). These tunnels are a type of label switched path (LSP) and thus are generally referred to as MPLS LSPs. Establishment of an LSP from a head-end LSR to a tail-end LSR involves computation of a path through a network of LSRs. Optimally, the computed path is the “shortest” path, as measured in some metric (e.g., cost). Notably, MPLS Traffic Engineering (TE) techniques may be used to ensure that the LSP (a “TE-LSP”) satisfies all relevant LSP Traffic Engineering constraints such as e.g., required bandwidth, “affinities” (administrative constraints to avoid or include certain links), etc. Moreover, a Label Distribution Protocol (LDP) may be used to share the particular labels used among network nodes, as will be understood by those skilled in the art.
Path computation can either be performed by the head-end LSR or by some other entity operating as a path computation element (PCE) not co-located on the head-end LSR. The head-end LSR (or a PCE) exploits its knowledge of network topology and resources available on each link to perform the path computation (e.g., according to the LSP Traffic Engineering constraints). Various path computation methodologies are available including SPF, or CSPF (constrained shortest path first) for TE-LSPs. MPLS LSPs can be configured within a single domain, e.g., area, level, or AS, or may also span multiple domains, e.g., areas, levels, or ASes.
The PCE is an entity having the capability to compute paths between any nodes of which the PCE is aware in an AS or area. PCEs are especially useful in that they are more cognizant of network traffic and path selection within their AS or area, and thus is may be used for more optimal path computation. A head-end LSR may further operate as a path computation client (PCC) configured to send a path computation request to the PCE, and receive a response with the computed path, which potentially takes into consideration other path computation requests from other PCCs. It is important to note that when one PCE sends a request to another PCE, it acts as a PCC. PCEs conventionally have limited or no visibility outside of their surrounding area(s), level(s), or AS. A PCC can be informed of a PCE either by pre-configuration by an administrator, or by a PCE Discovery (PCED) message (“advertisement”), which is sent from the PCE within its area or level or across the entire AS to advertise its services. An example IGP-based PCED is described in LeRoux, et al., IGP Protocol Extensions for Path Computation Element (PCE) Discovery) <draft-ietf-pce-disco-proto-igp-01.txt>, Internet Draft, March 2006, the contents of which are hereby incorporated by reference in its entirety.
Layer-3 network-based VPN services that utilize MPLS technology are often deployed by network service providers for one or more customer sites. These networks are typically said to provide “MPLS/VPN” services. As used herein, a customer site is broadly defined as a routing domain containing at least one customer edge device (CE) coupled to a provider edge device (PE) in the service provider's network (“provider network”). The customer site may be multi-homed to the provider network, i.e., wherein one or more of the customer's CEs is coupled to a plurality of PEs, thus providing a redundant connection. The PEs and CEs are generally intermediate network nodes, such as routers or switches, located at the edges of their respective networks. PE-CE links may be established over various physical media, such as conventional wire links, optical links, wireless links, etc., and may communicate data formatted using various network communication protocols including ATM, Frame Relay, Ethernet, Fibre Distributed Data Interface (FDDI), etc. In addition, the PEs and CEs may be configured to exchange routing information over their respective PE-CE links in accordance with various interior and exterior gateway protocols, such as BGP, OSPF, IS-IS, RIP, etc.
In the traditional MPLS/VPN network architecture, each customer site may participate in one or more different VPNs. Most often, each customer site is associated with a single VPN, and hereinafter the illustrative embodiments will assume a one-to-one correspondence between customer sites and VPNs. For example, customer sites owned or managed by a common administrative entity, such as a corporate enterprise, may be statically assigned to the enterprise's VPN. As such, network nodes situated in the enterprise's various customer sites participate in the same VPN and are therefore permitted to securely communicate with one another via the provider network. In other words, the provider network establishes the necessary LSPs to interconnect the customer sites participating in the enterprise's VPN. Likewise, the provider network also may establish LSPs that interconnect customer sites participating in other VPNs. This widely-deployed MPLS/VPN architecture is generally described in more detail in Chapters 8-9 of the reference book entitled MPLS and VPN Architecture, Volume 1, by I. Pepelnjak et al., published 2001 and in the IETF publication RFC 4364, entitled BGP/MPLS IP Virtual Private Networks (VPNs), by E. Rosen et al., published February 2006, each of which is hereby incorporated by reference as though fully set forth herein.
One problem associated with MPLS/VPN networks is their current inability to distribute TE information regarding PE-CE links across the provider network to other PEs. Traffic Engineering (TE), generally, refers to utilizing TE information to engineer (compute, determine, detect, etc.) traffic, such as for computing paths, creating TE-LSPs (e.g., MPLS TE-LSPs), load-balancing IP traffic, etc., as will be understood by those skilled in the art. Examples of TE information comprise, inter alia, the dynamically measured IP bandwidth, reservable MPLS bandwidth, unreserved bandwidth, administrative group (color), TE metric, or other conventional metrics that may be used for TE, e.g., cost. Notably, TE information may not only be used for MPLS, but also for IP, as will be understood by those skilled in the art.
One solution for distributing static link bandwidth of PE-CE links (or, more generally, an AS exit link) has been described in the document entitled BGP Link Bandwidth, published by Cisco Systems, Inc., March 2005, which is hereby incorporated by reference as though fully set forth herein. Here, the static link bandwidth of the PE-CE link (i.e., the maximum link capacity) may be advertised to BGP neighbors (e.g., other PEs). However, this solution does not provide TE information of the PE-CE links, such as, e.g., the dynamically measured IP bandwidth, reserved MPLS bandwidth, color, etc. of the PE-CE links.
Another solution to distribute TE information of PE-CE links is to leak the information into the provider network (the “core”), such as through IGP messages. This solution suffers numerous problems, however, such as VPN private addressing constraints, i.e., where CEs of different VPNs may share the same address, which may cause route confusion at receiving devices. Also, a lack of scalability may exist considering the possible number of PE-CE links (e.g., hundreds of thousands), which may surpass the limitations of internal route leaking (e.g., of IGP messages), thus possibly causing fragmented messages, error messages, etc., as will be understood by those skilled in the art. This lack of scalability may also apply to attempts to manually configure TE information, which would be overly cumbersome given the dynamic nature of TE information.
As a result of the inability to efficiently distribute dynamic TE information, various TE techniques may not be applied to the PE-CE links from other PEs not attached to the PE-CE links. In particular, TE techniques may not be applied to paths from one CE to another CE across the provider network (“CE-CE paths”). For example, CEs are sometimes multi-homed to a provider network, such as to provide multiple paths into the provider network for, e.g., redundancy, route flexibility (best path selection options). For IP routing, in particular, the CE may be unaware of the metrics used by the provider network beyond any locally attached PEs. Accordingly, the CE is unable to efficiently route traffic (IP traffic) over its multi-homed links. For instance, selecting a first CE-PE link into the provider network (e.g., at a first PE) based on known metrics of that link may not be an adequate representation of the metrics beyond the first PE. In other words, a CE-PE link that appears to the CE to be the best selection based on CE-PE link metrics, may, in fact, not be the best, such as where the path metrics within the provider network and beyond are greater from the first PE than from a second PE attached to the CE. Also, any attempt to load balance traffic over the multi-homed CE-PE links would be equally as inefficient. Currently, IP traffic load balancing over multi-homed CE-PE links may be symmetric (i.e., half of the traffic over a first link, the other half over a second link, etc.) or asymmetric based on the CE-PE link (e.g., traffic may be distributed proportionally to the link metrics of the CE-PE links). Again, however, neither load balancing solution offers efficient selection based on metrics beyond the CE-PE links. Without path metrics of the provider network, and more particularly, without end-to-end (CE-CE) path metrics, therefore, the CE may be inefficiently routing IP traffic.
There remains a need, therefore, for a technique that expands the TE topology of a provider/customer network (e.g., an MPLS/VPN network) to include the TE information of PE-CE links, such that various TE techniques may be applied to the network. There also remains a need for a technique for efficiently routing IP traffic over multi-homed CE-PE links (e.g., and based on complete CE-CE paths), and for applying PCE techniques to IP traffic, generally.