A computer network is a geographically distributed collection of interconnected subnetworks for transporting data between nodes, such as intermediate nodes and end nodes. A local area network (LAN) is an example of such a subnetwork; a plurality of LANs may be further interconnected by an intermediate network node, such as a router or switch, to extend the effective “size”of the computer network and increase the number of communicating nodes. Examples of the end nodes may include servers and personal computers. The nodes typically communicate by exchanging discrete frames or packets of data according to predefined protocols. In this context, a protocol consists of a set of rules defining how the nodes interact with each other.
In modern network technology, nodes such as routers and switches are commonly utilized to forward data packets toward their destinations. (In the context of the present application, the terms “node”, “router”, and “switch” are used synonymously and interchangeably.) Routers basically keep a local copy of the network topology in a link state database. For example, U.S. Patent Application No. 2004/0252707 teaches a system and method for maintaining network system information. U.S. Patent Application No. 2005/0041676 describes link state routing algorithms, such as the well-known Open Shortest Path First (OSPF) algorithm, which permit the construction of a network topology such that any given node may make packet-forwarding decisions.
Each router in a given network typically has several data paths, each servicing different forms of traffic. For instance, transit traffic is usually forwarded from the ingress line card (LC) to the fabric, and then to the egress LC for next hop delivery to a neighboring router. Other data paths for router traffic may include one or more paths for traffic requiring local processing. Each LC typically has its own central processing unit (CPU). Most often, each router includes a route processor for local processing of data received from the line cards, handling routing protocols, running applications, managing traffic, etc.
In routed networks, it is important to detect when a link or node failure occurs. In the past, routers periodically sent “hello” messages over all active interfaces to determine the state of the neighboring routers and to detect failures. According to this scheme, when a message from a neighbor is not received for a time exceeding some predetermined period, the adjacent nodes conclude that a failure occurred so that appropriate procedures may be initiated. By way of further background, U.S. Pat. No. 6,530,032 describes a network fault recovery mechanism.
The chief drawback with the use of hello messages as a network failure detection mechanism is that the detection periods tend to be long, resulting in significant lost data traffic. Fast convergence also requires that the failure be detected as rapidly as possible. For example, in OSPF the minimum interval for hello packets is one second, and the link is considered down after three hello packets are lost. Furthermore, neighboring nodes running different protocols, each with their own version of hello messages, often lacked the ability to negotiate compatible hello intervals.
Bidirectional forwarding detection (BFD) is a liveliness testing protocol described in draft-ietf-bfd-base-00.txt that overcomes many of the problems with past hello messaging approaches. BFD operates independent of media, data protocols, and routing protocols to detect faults in the bidirectional path between two forwarding engines, including interfaces and data links. In routers, BFD is typically implemented in the forwarding plane to keep it independent from the control plane functions.
The BFD protocol works by establishing a session in which neighboring devices first negotiate a set of configuration parameters that includes a BFD packet interval rate. For example, FIG. 1 illustrates a conventional BFD control packet format, which includes various desired/required minimum transmit (Tx) and receive (Rx) intervals. Other BFD packet fields include a diagnostic (“Diag”) field for indicating detection time expired, echo failed, etc.; a “H” (I hear you) bit, set when receiving packets from a remote device; a demand mode (“D”) bit, set when operating in Demand mode; a “P” (poll) bit, set when requesting parameter change; a “F” (final) bit, set when responding to a received packet with a P bit; and a “Detect Mult” bit field, used to calculate a detection time.
BFD can work in asynchronous mode, in which the rate at which different nodes send BFD packets may differ, or in demand mode, in which BFD packets are sent only when it is desired to test the data path. BFD also has an echo mode, in which a node sends a stream of BFD packets that gets looped back, basically testing the capability of the other node to switch packets back to the sender.
Although BFD has been useful in reducing the time it takes to detect a link or node failure in a network, there are still certain problems that can arise with the use of BFD, leading to undetected faults or false positive alarms. For example, in a Denial-of-Service (DoS) attack, an unscrupulous hacker typically floods a router with a high volume of data traffic that can overwhelm its processing capabilities. To protect the router's processor against such attacks, a packet rate-limiting device known as a hardware policer is usually implemented in the forwarding engine. The purpose of the hardware policer is to limit the rate of incoming data packets to protect the LC CPU against a DoS attack.
One problem is that incoming BFD packets may be sent to the LC CPU at a rate that exceeds the maximum rate, e.g., 7000 packets-per-second (pps), of the hardware policer. There are cases, for instance, where BFD sessions are over-configured such that the data packet requirements are greater than 7000 pps on one LC. By way of example, 400 Virtual Local Area Networks (VLANs) may be added in an OSPF area that has BFD enabled globally at 50 ms intervals and already has 300 existing VLANs. This would require an additional 8,000 pps on top of the 6,000, pps (for the existing 300 VLANs) for a total of 14,000 pps. As a result of the creation of the 400 additional VLANs, 50% of the BFD packets would be dropped. This could cause existing BFD sessions to “flap” (i.e., produce a false positive), resulting in unnecessary network traffic or “churn”. On the other hand, if the 400 new BFD sessions are not created, a network operator would be required to manually “activate” these sessions at a later time when resources are free, e.g., if the 300 VLANs are removed. Neither of these result scenarios is desirable.
Thus, what is a needed is an improved mechanism for BFD rate-limiting and BFD session activation which overcomes the problems inherent in the prior art.