§ 1.1 Field of the Invention
The invention concerns detecting failures in communications systems. In particular, the invention concerns detecting failures, such as forwarding engine failures, interface failures, and/or link failures, of a data forwarding path between and including two data forwarding devices, such as routers for example.
§1.2 Related Art
The description of art in this section is not an admission that such art is prior art to the invention.
An increasingly important feature of networking equipment is the rapid detection of communication failures between adjacent systems, in order to more quickly establish, or switch over to, alternative paths once an error occurs. Currently, failures can be detected fairly quickly in certain circumstances if data link hardware (such as SONET alarms for example) supports such detection.
However, there are media that do not provide this kind of signaling (such as Ethernet), and some media may not detect certain kinds of failures in the path, for example, failing interfaces or forwarding engine components. Moreover, failure detection is often much slower in many communications network devices, especially if there is no hardware signaling to facilitate such detection. For example, routing protocols sometimes include some form of liveness detection. For example, the intermediate system-intermediate system protocol (IS-IS) and the open shortest path first protocol (OSPF) include a “hello” mechanism that lets a router running IS-IS or OSPF know whether nodes sharing a communications link with the router (e.g., its neighbors or peers) are still up. Some protocols, such as a border gateway protocol (BGP) for example, use the underlying transport to determine the liveness of their neighbors. In the case of BGP, TCP keepalives are used. Other protocols, such as routing information protocols (RIP) for example, have intrinsic liveness mechanisms. In most cases, once an adjacency (e.g., with a neighbor node running the same protocol) is established with an initial hello message, subsequent hello messages don't need to carry much information.
In most, if not all, of these existing protocol-based liveness detection mechanisms, the time needed to conclude that one's neighbor is down ranges from seconds, to tens, or even hundreds of seconds. For example, with IS-IS, hellos are normally sent every nine (9) seconds. A node determines a neighbor to be down only after three (3) consecutive hellos have been unanswered. Accordingly, a node running IS-IS normally needs at least 27 seconds before it can determine that a neighbor node is down. Similarly, with the point-to-point protocol (PPP), hellos are normally sent every ten (10) seconds. A node determines a neighbor to be down only after three (3) consecutive hellos have been unanswered. Accordingly, a node running PPP normally needs at least 30 seconds before it can determine whether a neighbor node is down.
Historically, since routers and other nodes on the Internet have been predominantly used for communicating data, mainly on a best effort basis, for applications (such as e-mail for example) that are tolerant of some delays or packets received out of sequence, the aforementioned delays in detecting liveness were acceptable. However, as alluded to above, as it becomes desirable to have more demanding applications (such as voice over IP for example) use the Internet or some other packet-switched network, there are many instances where it is required to detect that a neighbor is down in a few tenths of a second, or even hundredths of a second. Such fast liveness detection is needed where failover needs to occur quickly so that an end user doesn't perceive, or at least isn't unduly annoyed by, the failure of an adjacency (e.g., due any one or a node failure, a link failure, or a protocol failure). As another example of a need for fast liveness detection and failover, a one second time for such detection may represent a great deal of lost data at gigabit rates.
Furthermore, routing protocol Hellos are of no help when those routing protocols are not in use. Moreover, the semantics of failure detection using routing protocols versus failure detection using data link hardware are subtly different—routing protocol failure detection techniques detect a failure in the path between the two routing protocol engines.
In view of the foregoing, there is a need to quickly detect failures in a data forwarding path, such as interface failures, link failures and/or forwarding engine failures, between and including two forwarding engines.