1. Field of the Invention
The present invention relates to computer networks and more particularly to distinguishing between link and node failure using bidirectional forwarding detection (BFD) in a computer network.
2. Background Information
A computer network is a geographically distributed collection of interconnected subnetworks, such as local area networks (LAN) that transport data between network nodes. As used herein, a network node is any device adapted to send and/or receive data in the computer network. Thus, in this context, “node” and “device” may be used interchangeably. The network topology is defined by an arrangement of network nodes that communicate with one another, typically through one or more intermediate nodes, such as routers and switches. In addition to intra-network communications, data also may be exchanged between neighboring (i.e., adjacent) networks. To that end, “edge devices” located at the logical outer-bound of the computer network may be adapted to send and receive inter-network communications. Both inter-network and intra-network communications are typically effected by exchanging discrete packets of data according to predefined protocols. In this context, a protocol consists of a set of rules defining how network nodes interact with each other.
Each data packet typically comprises “payload” data prepended (“encapsulated”) by at least one network header formatted in accordance with a network communication protocol. The network headers include information that enables network nodes to efficiently route the packet through the computer network. Often, a packet's network headers include a data-link (layer 2) header, an internetwork (layer 3) header and a transport (layer 4) header as defined by the Transmission Control Protocol/Internet Protocol (TCP/IP) Reference Model. The TCP/IP Reference Model is generally described in more detail in Section 1.4.2 of the reference book entitled Computer Networks, Fourth Edition, by Andrew Tanenbaum, published 2003, which is hereby incorporated by reference as though fully set forth herein. A data packet may originate at a source node and subsequently “hop” from node to node along a logical data path until it reaches its addressed destination node. The network addresses defining the logical data path of a data flow are most often stored as Internet Protocol (IP) addresses in the packet's internetwork header.
A computer network may contain smaller groups of one or more subnetworks which may be managed as separate routing domains. As used herein, a routing domain is broadly construed as a collection of interconnected network nodes under a common administration. Often, a routing domain is managed by a single administrative entity, such as a company, an academic institution or a branch of government. Such a centrally-managed routing domain is sometimes referred to as an “autonomous system.” In general, a routing domain may operate as an enterprise network, a service provider or any other type of network or subnetwork. Further, the routing domain may contain one or more edge devices having “peer” connections to edge devices in adjacent routing domains.
Network nodes within a routing domain are typically configured to forward data using predetermined paths from “interior gateway” routing protocols, such as conventional link-state protocols and distance-vector protocols. These interior gateway protocols (IGPs) define the manner with which routing information and network-topology information are exchanged and processed in the routing domain. The routing information exchanged (e.g., by IGP messages) typically includes destination address prefixes, i.e., the portions of destination addresses used by the routing protocol to render routing (“next hop”) decisions. Examples of such destination addresses include IP version 4 (IPv4) and version 6 (IPv6) addresses. As such, each intermediate node receives a consistent “view” of the domain's topology. Examples of link-state and distance-vectors protocols known in the art, such as the Open Shortest Path First (OSPF) protocol and Routing Information Protocol (RIP), are described in Sections 12.1-12.3 of the reference book entitled Interconnections, Second Edition, by Radia Perlman, published January 2000, which is hereby incorporated by reference as though fully set forth herein.
The Border Gateway Protocol (BGP) is usually employed as an “external gateway” routing protocol for routing data between autonomous systems. BGP is well known and generally described in Request for Comments (RFC) 1771, entitled A Border Gateway Protocol 4 (BGP-4), by Y. Rekhter et al., published March 1995, which is publicly available through the Internet Engineering Task Force (IETF) and is hereby incorporated by reference in its entirety. External (or exterior) BGP (eBGP) is often used to exchange routing information across routing domain boundaries. Internal BGP (iBGP) is a variation of the eBGP protocol and is often used to distribute inter-network reachability information (address prefixes) among BGP-enabled edge devices situated within the same routing domain. BGP generally operates over a reliable transport protocol, such as TCP, to establish a TCP connection/BGP session. BGP also may be extended for compatibility with services other than standard Internet connectivity. For instance, Multi-Protocol BGP (MP-BGP) supports various address family identifier (AFI) fields that permit BGP messages to transport multi-protocol information, such as is the case with RFC 2547 services, discussed below.
A network node within a routing domain may detect a change in the domain's topology. For example, the node may become unable to communicate with one of its neighboring nodes, e.g., due to a link failure between the nodes or the neighboring node failing, such as going “off line,” etc. If the detected node or link failure occurred within the routing domain, the detecting node may advertise the intra-domain topology change to other nodes in the domain using IGP messages. Similarly, if an edge device detects a node or link failure that prevents communications with a neighboring routing domain, the edge device may disseminate the inter-domain topology change to other edge devices within its routing domain (e.g., using the iBGP protocol). In either case, propagation of the network-topology change occurs within the routing domain and nodes in the domain thus converge on a consistent view of the new network topology, i.e., without the failed node or link.
A virtual private network (VPN) is a collection of network nodes that establish private communications over a shared backbone network. Previously, VPNs were implemented by embedding private leased lines in the shared network. The leased lines (i.e., communication links) were reserved only for network traffic among those network nodes participating in the VPN. Today, the above-described VPN implementation has been mostly replaced by private “virtual circuits” deployed in public networks. Specifically, each virtual circuit defines a logical end-to-end data path between a pair of network nodes participating in the VPN. When the pair of nodes is located in different routing domains, edge devices in a plurality of interconnected routing domains may have to cooperate to establish the nodes' virtual circuit.
A virtual circuit may be established using, for example, conventional layer-2 Frame Relay (FR) or Asynchronous Transfer Mode (ATM) networks. Alternatively, the virtual circuit may “tunnel” data between its logical end points using known layer-2 and/or layer-3 tunneling protocols, such as the Layer-2 Tunneling Protocol (L2TP) and the Generic Routing Encapsulation (GRE) protocol. In this case, one or more tunnel headers are prepended to a data packet to appropriately route the packet along the virtual circuit. The Multi-Protocol Label Switching (MPLS) protocol may be used as a tunneling mechanism for establishing layer-2 virtual circuits or layer-3 network-based VPNs through an IP network.
Layer-3 network-based VPN services that utilize MPLS technology are often deployed by network service providers for one or more customer sites. These networks are typically said to provide “MPLS/VPN” services. As used herein, a customer site is broadly defined as a routing domain containing at least one customer edge (CE) device coupled to a provider edge (PE) device in the service provider's network (“provider network”). The customer site (e.g., a Voice over IP, or VoIP gateway) may be multi-homed to the provider network, i.e., wherein one or more of the customer's CE devices is coupled to a plurality of PE devices, thus providing a redundant connection. The PE and CE devices are generally intermediate network nodes, such as routers or switches, located at the edge of their respective networks. PE-CE data links may be established over various physical mediums, such as conventional wire links, optical links, wireless links, etc., and may communicate data formatted using various network communication protocols including ATM, Frame Relay, Ethernet, Fibre Distributed Data Interface (FDDI), etc. In addition, the PE and CE devices may be configured to exchange routing information over their respective PE-CE links in accordance with various interior and exterior gateway protocols, such as BGP, OSPF, IS-IS, RIP, etc. The MPLS/VPN architecture is generally described in more detail in Chapters 8-9 of the reference book entitled MPLS and VPN Architecture, Volume 1, by I. Pepelnjak et al., published 2001 and in the IETF publication RFC 2547, entitled BGP/MPLS VPNs, by E. Rosen et al., published March 1999, each of which is hereby incorporated by reference as though fully set forth herein.
As those skilled in the art will understand, it is desirable to quickly detect the failure of a PE-CE link (or other links) so that minimal traffic is lost. Conventionally, since a BGP session is often employed between the two inter-domain devices (e.g., a PE device and a CE device), BGP KEEPALIVE messages may be used to determine whether the peers are reachable (e.g., for link or node failure). For instance, BGP may specify a Hold Time interval, the expiration of which indicating that an error has occurred within the BGP session (e.g., at least three seconds). Each BGP message received at a device resets the Hold Time. A BGP KEEPALIVE message may be exchanged between the devices of the BGP session to reset the Hold Time. As such, the interval between exchanged KEEPALIVE messages must be often enough as not to cause the Hold Timer to expire. Conventionally, a reasonable maximum time between KEEPALIVE messages would be one third of the Hold Time interval. However, according to the BGP standard set forth in RFC 1771, the KEEPALIVE messages must not be sent more frequently than one per second, e.g., in order to minimize traffic between the BGP devices. Notably, in the event the Hold Time has expired, the devices may “break” (i.e., tear down or close) the BGP session.
Because of the increasing need for faster network response time and convergence, administrators often require the ability of individual network devices to quickly detect failures. Bidirectional Forwarding Detection (BFD) provides rapid failure detection times between devices, while maintaining low overhead. For instance, BFD failure detection may be as fast as 50 milliseconds (ms), while the BGP method described above is on the order of seconds (e.g., three seconds). BFD verifies connectivity between two devices based on the rapid transmission of BFD control packets between the two devices (e.g., little to no BFD holdtime, as will be understood by those skilled in the art). Notably, BFD also provides a single, standardized method of link/device/protocol failure detection at any protocol layer and over any media. A secondary benefit of BFD, in addition to fast failure detection, is that it provides network administrators with a consistent method of detecting failures. Thus, one availability methodology could be used, regardless of the protocol (e.g., IGP, BGP, etc.) or the topology of the network. BFD is further described in Katz, et al. Bidirectional Forwarding Detection <draft-ietf-bfd-base-04.txt>, Internet Draft, October, 2005, the contents of which are hereby incorporated by reference as though fully set forth herein.
One problem with using BFD is that it is difficult to determine whether a monitored node has failed, or if simply the link over which the BFD messages traverse has failed. All a monitoring node can definitely determine is that it has stopped receiving returned BFD messages from the monitored node. While this is still very useful for many applications, it is often important not to declare a node as failed (“down”) when in fact the node is operational (“up”). For instance, two nodes (e.g., a CE device and a PE device) may be connected via multiple links for redundancy, as will be understood by those skilled in the art. If a BFD session is established over one of those links and that link fails, it is important to determine that only the BFD link is down if the other redundant links are still functioning. Declaring a node down when it is not produces undesirable results in the network, such as improper traffic forwarding, etc., as will be understood by those skilled in the art.
One solution to this problem involves the use of a “not-via” address. Not-via addresses are described in detail in Bryant, et al., IP Fast Reroute Using Not-via Addresses <draft-bryant-shand-IPFRR-notvia-addresses--01.txt>, Internet Draft, October 2005, and in commonly-owned U.S. patent application Ser. No. 11/064,275, entitled METHOD AND APPARATUS FOR CONSTRUCTING A REPAIR PATH AROUND A NON-AVAILABLE COMPONENT IN A DATA COMMUNICATIONS NETWORK, filed by Bryant et al. on Feb. 22, 2005, now published as U.S. Publication No. US2006/0187819 on Aug. 24, 2006, the contents of both which are hereby incorporated by reference in their entirety. The semantics of a not-via address are that a packet addressed to a not-via address must be delivered to the router/node advertising that address, not via the particular component, e.g., a link, node, shared risk link group (SRLG), etc., with which that address is associated.
The solution available using a not-via address generally consists of a “dual BFD” arrangement. For instance, a first BFD session is established along a first link between two nodes, and a second BFD session is established along a second link, not-via (excluding) the first link, between the two nodes. If the first BFD session fails, but the second remains operational, then it is determined that the first link has failed. If both BFD sessions fail, it is determined that the opposing node has failed.
A similar solution has been proposed using MPLS tunnels, as described in commonly-owned copending U.S. patent application Ser. No. 10/171,395, entitled DISTINGUISHING BETWEEN LINK AND NODE FAILURE TO FACILITATE FAST REROUTE, filed by Charny et al. on Jun. 12, 2002, currently published as U.S. Patent Application Publication No. 2003/0233595 on Dec. 18, 2003, the contents of which are hereby incorporated by reference in its entirety. In particular, a first MPLS tunnel may be created between two nodes over a first path, and a second (alternate) MPLS tunnel may be created between the two nodes over a second path that excludes the first path. If the first tunnel fails, the second tunnel is used to determine whether the first path has failed, or whether the opposing node has failed.
A problem with both proposed solutions above is that, while providing better determination of link versus node failure, they may both lead to false indications when only a portion of a node fails. Specifically, many not-via addresses and alternate tunnels utilize the same resources of a node, such as, e.g., a particular line card. For instance, for redundancy, many high-volume nodes share multiple links between each other. These multiple links, however, are often located within an SRLG, such as shared fibers, conduits, cables, etc., which often originate/terminate at a single line card of the node. The result is that many not-via address and/or alternate tunnels, while not utilizing the same link or interface, may utilize the same line card or other resources of the node. For this reason, if the shared resource fails at the node, the above solutions may produce a false indication of node failure. There remains a need, therefore, for a technique that rapidly and deterministically concludes that a node has in fact failed, as opposed to merely a link or shared resource (e.g., a line card) of the node.