1. Field of the Invention
The present invention relates to computer networks and more particularly to maintaining bidirectional forwarding detection (BFD) on a bundle of links in a computer network.
2. Background Information
A computer network is a geographically distributed collection of interconnected subnetworks, such as local area networks (LAN) that transport data between network nodes. As used herein, a network node is any device adapted to send and/or receive data in the computer network. Thus, in this context, “node” and “device” may be used interchangeably. The network topology is defined by an arrangement of network nodes that communicate with one another, typically through one or more intermediate nodes, such as routers and switches. In addition to intra-network communications, data also may be exchanged between neighboring (i.e., adjacent) networks. To that end, “edge devices” located at the logical outer-bound of the computer network may be adapted to send and receive inter-network communications. Both inter-network and intra-network communications are typically effected by exchanging discrete packets of data according to predefined protocols. In this context, a protocol consists of a set of rules defining how network nodes interact with each other.
Each data packet typically comprises “payload” data prepended (“encapsulated”) by at least one network header formatted in accordance with a network communication protocol. The network headers include information that enables network nodes to efficiently route the packet through the computer network. Often, a packet's network headers include a data-link (layer 2) header, an internetwork (layer 3) header and a transport (layer 4) header as defined by the Transmission Control Protocol/Internet Protocol (TCP/IP) Reference Model. The TCP/IP Reference Model is generally described in more detail in Section 1.4.2 of the reference book entitled Computer Networks, Fourth Edition, by Andrew Tanenbaum, published 2003, which is hereby incorporated by reference as though fully set forth herein. A data packet may originate at a source node and subsequently “hop” from node to node along a logical data path until it reaches its addressed destination node. The network addresses defining the logical data path of a data flow are most often stored as Internet Protocol (IP) addresses in the packet's internetwork header.
A computer network may contain smaller groups of one or more subnetworks which may be managed as separate routing domains. As used herein, a routing domain is broadly construed as a collection of interconnected network nodes under a common administration. Often, a routing domain is managed by a single administrative entity, such as a company, an academic institution or a branch of government. Such a centrally-managed routing domain is sometimes referred to as an “autonomous system.” In general, a routing domain may operate as an enterprise network, a service provider or any other type of network or subnetwork. Further, the routing domain may contain one or more edge devices having “peer” connections to edge devices in adjacent routing domains.
Network nodes within a routing domain are typically configured to forward data using predetermined paths from “interior gateway” routing protocols, such as conventional link-state protocols and distance-vector protocols. These interior gateway protocols (IGPs) define the manner with which routing information and network-topology information are exchanged and processed in the routing domain. The routing information exchanged (e.g., by IGP messages) typically includes destination address prefixes, i.e., the portions of destination addresses used by the routing protocol to render routing (“next hop”) decisions. Examples of such destination addresses include IP version 4 (IPv4) and version 6 (IPv6) addresses. As such, each intermediate node receives a consistent “view” of the domain's topology. Examples of link-state and distance-vectors protocols known in the art, such as the Open Shortest Path First (OSPF) protocol and Routing Information Protocol (RIP), are described in Sections 12.1-12.3 of the reference book entitled Interconnections, Second Edition, by Radia Perlman, published January 2000, which is hereby incorporated by reference as though fully set forth herein.
The Border Gateway Protocol (BGP) is usually employed as an “external gateway” routing protocol for routing data between autonomous systems. BGP is well known and generally described in Request for Comments (RFC) 1771, entitled A Border Gateway Protocol 4 (BGP-4), by Y. Rekhter et al., published March 1995, which is publicly available through the Internet Engineering Task Force (IETF) and is hereby incorporated by reference in its entirety. BGP generally operates over a reliable transport protocol, such as TCP, to establish a TCP connection/BGP session. BGP also may be extended for compatibility with services other than standard Internet connectivity. For instance, Multi-Protocol BGP (MP-BGP) supports various address family identifier (AFI) fields that permit BGP messages to transport multi-protocol information.
A network node within a routing domain may detect a change in the domain's topology. For example, the node may become unable to communicate with one of its neighboring nodes, e.g., due to a link failure between the nodes or the neighboring node failing, such as going “off line,” etc. If the detected node or link failure occurred within the routing domain, the detecting node may advertise the intra-domain topology change to other nodes in the domain using IGP messages. Similarly, if an edge device detects a node or link failure that prevents communications with a neighboring routing domain, the edge device may disseminate the inter-domain topology change to other edge devices within its routing domain (e.g., using BGP messages). In either case, propagation of the network-topology change occurs within the routing domain and nodes in the domain thus converge on a consistent view of the new network topology, i.e., without the failed node or link.
As those skilled in the art will understand, it is desirable to quickly detect the failure of a node or link so that minimal traffic is lost. Conventionally, since a BGP session is often employed between the two inter-domain devices, BGP KEEPALIVE messages may be used to determine whether the peers are reachable (e.g., for link or node failure). For instance, BGP may specify a Hold Time interval, the expiration of which indicating that an error has occurred within the BGP session (e.g., at least three seconds). Each BGP message received at a device resets the Hold Time. A BGP KEEPALIVE message may be exchanged between the devices of the BGP session to reset the Hold Time. As such, the interval between exchanged KEEPALIVE messages must be often enough as not to cause the Hold Timer to expire. Conventionally, a reasonable maximum time between KEEPALIVE messages would be one third of the Hold Time interval. However, according to the BGP standard set forth in RFC 1771, the KEEPALIVE messages must not be sent more frequently than one per second, e.g., in order to minimize traffic between the BGP devices. Notably, in the event the Hold Time has expired, the devices may “break” (i.e., tear down or close) the BGP session. Similarly, as those skilled in the art will understand, IGP nodes within a network may exchange IGP HELLO messages to determine whether internal peers (intradomain nodes) are reachable.
Because of the increasing need for faster network response time and convergence, administrators often require the ability of individual network devices to quickly detect failures. Bidirectional Forwarding Detection (BFD) provides rapid failure detection times between devices, while maintaining low overhead. For instance, BFD failure detection may be as fast as 50 milliseconds (ms), while the BGP (and IGP) method described above is on the order of seconds (e.g., three seconds). BFD verifies connectivity between two devices based on the rapid transmission of BFD control packets between the two devices (e.g., little to no BFD holdtime, as will be understood by those skilled in the art). Notably, BFD also provides a single, standardized method of link/device/protocol failure detection at any protocol layer and over any media. A secondary benefit of BFD, in addition to fast failure detection, is that it provides network administrators with a consistent method of detecting failures. Thus, one availability methodology could be used, regardless of the protocol (e.g., IGP, BGP, etc.) or the topology of the network. BFD is further described in Katz, et al. Bidirectional Forwarding Detection<draft-ietf-bfd-base-04.txt>, Internet Draft, October, 2005, the contents of which are hereby incorporated by reference as though fully set forth herein. Generally, BFD sessions may be established between network nodes (e.g., routers) in order to monitor connectivity of the nodes over a particular link between the nodes.
Often, users organize links as a group or “bundle” for a variety of reasons known to those skilled in the art. For example, link groups may be used for redundancy, load balancing, and/or increasing the available bandwidth between two points in the network, e.g., by combining multiple smaller/slower links into a single group of links that produces a greater bandwidth value than the smaller/slower links individually. Various forms of link bundles include link bonding/aggregation, EtherChannel, and multilink point-to-point protocol (PPP), which is described in RFC 1990, entitled The PPP Multilink Protocol (MP), published August 1996, etc. Other examples of link bundles will be understood by those skilled in the art, and those mentioned herein are merely examples. For instance, links of VLANs (Virtual LANs, or a group of network devices/elements on different physical LAN segments operating as though they were on the same physical LAN segment) may also be bundled.
Just as BFD may be used to very rapidly determine connectivity between two nodes, it may also be desirable to rapidly determine the maintained operation (i.e., connectivity) of a link bundle between two nodes (or more, as will be understood by those skilled in the art). A BFD session on a link bundle should remain active while the bundle is active, i.e., still able to transmit traffic. In other words, so long as one or more (configurable) links of the bundle (“bundle links”) are operational, the bundle, and hence the BFD session, should remain active. Accordingly, none of the following events should cause the BFD session on the bundle to fail if other links of the bundle are still operational: failure of a bundle link; online insertion and removal (OIR) of a line card (LC) that hosts one or more bundle links; addition of a link to the bundle; removal of a bundle link; shutdown of a bundle link; failure of a centralized route processor (RP) or control card of a node (RP failover); etc.
One solution that offers BFD for link bundles is a centralized BFD session, such as an RP-based BFD session. In a centralized BFD session, the RP (or control card) monitors the status of the BFD session and each of the bundle links. BFD messages may be sent on each bundle link so that while at least one bundle link is operational, so is the bundle, and consequently so is the centralized BFD session. In many situations, however, the large number of BFD messages sent between the nodes may violate various BFD parameters, such as, e.g., maximum packet transmissions or time between packets as agreed upon during BFD session initialization (negotiation). Also, two nodes at either end of a link bundle, e.g., routers (layer 3), may be separated by one or more intermediate switches (layer 2). The one or more switches may receive a plurality of BFD messages from one router, but based on routing/forwarding decisions at the switch (e.g., layer 2 hashing algorithms), the switch may only continue the transmission of BFD messages over a single bundle link, leaving the other bundle links unmonitored. Furthermore, the two routers may be connected to the switch by a different number of links (e.g., a link bundle from each router terminates at the switch), so a one-to-one mapping of BFD messages on individual links is not possible.
Alternatively, to address some of the limitations of sending BFD messages over all bundle links, the centralized BFD session may instead send and receive BFD messages over a single link. While the BFD messages are received by that single link, the link bundle is still operational. When BFD messages are no longer received on that link, the centralized BFD session must select a different link of the bundle to send and receive BFD messages.
A major disadvantage to operating a centralized BFD session for link bundles is that if the RP (control card) fails, the centralized BFD session also fails, resulting in a false indication of link bundle failure where at least one LC hosting the link bundle is still operational. Clearly, for many applications, the centralized BFD session approach is unreliable. Further disadvantages include poor scalability for multiple link bundles, and decreased performance. For instance, the RP is often tasked with many different operations of the node, such as updating routing tables, generating advertisement messages, etc., as will be understood by those skilled in the art. As such, the time dedicated to the BFD session may be less than necessary to maintain the high-demand reaction time of the BFD protocol (i.e., sub-second failure detection). To compensate for the centralized BFD session's potentially slower reaction time (e.g., for selecting new links during single-link BFD sessions), it may be necessary to use longer timeout values (hold timers) to allow the RP to adjust to changes in the network. There remains a need, therefore, for a technique that efficiently maintains BFD on a bundle of links to address the problems mentioned above.