Internet Protocol (IP) networks are characterized by packet delays, jitter, losses, and even occasional hardware/software failures—to include those due to viruses and attacks. Yet, applications like Voice over IP (VoIP) are demanding underlying infrastructures to both be reliable and provide acceptable quality-of-service (QoS). Consequently, a lot of ongoing work is geared towards improving the reliability, performance, and QoS characteristics of IP networks. Much of this work involves monitoring network conditions and detecting network failures.
In the prior art there are two broad categories of network monitoring schemes. The first type, distributed monitoring, typically assumes that the measurement instrumentation can be either distributed at different points in the network or placed at the endpoints of the end-to-end path. The second way to monitor network conditions requires only one centralized observation point, which makes installation easier, but it takes longer to detect failures.
With respect to the detection of a network failure, there are two well known methods in the prior art: (i) active injection of test probes into the network, and (ii) passive sniffing of existing network traffic. Many network measurement tools send periodic roundtrip echo probes to a distant host which then responds back to the sender. These probes are usually in the form of Internet Control Message Protocol (ICMP) Echo packets and are well know in the prior art by such names as NetDyn probes, Netnow probes and Fping probes in Imeter.
In many cases, an application may be able to detect a network failure (or, in some cases, receive an explicit network notification regarding the failure) much faster than the time it takes to repair the failure (e.g., by restoration schemes). Current failure-detection methods typically notify only the network manager, primarily for maintenance purposes. Users do not normally receive alerts of network problems.
Currently in the prior art, it is possible to provide voice quality feedback to a user participating in a VoIP session. Various metrics related to RTP (Real-Time Transport Protocol) traffic could be computed by an application and presented to the user. However, current network devices do not automatically inform applications or users about network problems. These are mostly gleaned via concepts like timeouts. The emphasis in VoIP and other applications has been to deliberately hide the network and its problems from the user.
The present invention is directed to overcoming these shortcomings in the converged Network. In particular, it is directed to (i) rapid failure detection and (ii) user notification.
Rapid detection and notification of failures, long before the network is actually restored/repaired, can be exploited by users and network managers in various advantageous ways. For instance, users can decide whether to continue their calls (“on hold”) or, instead, wait until the network is restored and make another call later. Also, network managers might take earlier recovery efforts, for example, to redirect existing and future calls and help prevent a bigger network outage. For instance, as the jitter/loss increases to unacceptable levels on an IP path, calls might be rerouted to a different IP path or perhaps even redirected to a non-IP network (e.g. the PSTN). The point is that once the user/manager is made aware of the network situation, they can decide what to do (or what not to do) long before the network recovers.