It is important for service providers who offer services using a network to monitor the network for any operational condition that has the potential for causing degradation of the quality of service in a customer application. Internet Protocol Television (IPTV) is an example of such a customer application. Accordingly, service providers emphasize the need for testing procedures of customer applications. This need for testing procedures of customer applications is particularly important and desired when such customer applications rely on new technologies such as Virtual Private LAN Services (VPLS) in the network layer.
When new network technologies such as VPLS or new services such as IPTV are integrated into a network, service providers face the challenge of not fully understanding the impact that such new network technologies and/or services are going to have on the network. In their first iterations, new network technologies usually include management and OAM (Operations, Administration and Maintenance) tools for basic troubleshooting (i.e., basic diagnostic tools). These basic diagnostic tools are typically of an out-of-band type (e.g., Management Information Base (MIB) queries, etc) and/or in-band type (e.g., fault detection with connectivity checking mechanisms, fault verification with ping-like mechanisms, fault localization with Traceroute-like mechanisms, etc). Following introduction of new network technologies, more advanced diagnostics tools need to be defined and are of great value for the network administrators. These more advanced diagnostic tools are useful for both problem solving and for verification of operational networks. Furthermore, due to their advanced capabilities, these advanced diagnostic tools remain useful even when usage of the new network technology becomes well established in networks.
Diagnostics tools provide for verification of node configuration, traffic mapping, data paths, virtual circuits and the like. In an ideal world, there would be no need for verification tools. But, errors do happen (e.g., human error, protocol error, software bug, hardware defect, etc) and tools to identify, locate, prevent, and catch these errors are always needed. Immediately after a network initial activation or a reset, a network operator may want to verify essential network configuration settings. During normal operations, the network operator will also be interested in more subtle operational conditions, which don't result in obvious service interruption, but rather in service degradation or, potentially, future service interruption. In the specific case of VPLS, diagnostic verifications encompass multiple nodes (e.g., possibly all the edge nodes of a VPLS instance) and multiple layers (e.g., VPLS, Ethernet, pseudo-wires, tunnelling mechanisms). The problem of VPLS-wide verifications is one application-specific example of where such diagnostic tools are needed. However, the same problem exists for most, if not all, network technologies.
Various approaches are known for providing diagnostic tools that are useful for problem solving and verification in operational networks. Out-of-band queries are one such known approach. As shown in FIG. 1 (prior art), out-of-band queries are issued from a NMS (Network Management System) via, for example, Simple Network Management Protocol (SNMP) or Common Management Interface Protocol, Common Management Information Service Element (CMIP/CMISE) for reception by a plurality of Provider Edge nodes (PE). Examples of such Provider Edge nodes include, but are not limited to, routers, bridges, servers and the like. These out-of-band queries are efficient in their ability to retrieve configuration information from a node, provided the corresponding MIB contains the desired information. However, this approach can become tedious if many parameters from many nodes are needed for a particular question. To reduce the tedium, out-of-band queries can be scripted to automate the retrieval process, but such scripting does not address other drawbacks of MIB queries. Examples of such other drawbacks include, but are not limited to, the fact that a datapath is not itself tested and the fact that some data may not be present in the MIBs (e.g., like entries in forwarding tables used in the datapath). Furthermore, while out-of-band queries are on-demand by nature, it is disclosed herein that a pro-active solution can be based on them. Another shortcoming of out-of-band queries is that the network management systems for various operator-owned nodes are not necessarily integrated, which means that troubleshooting a certain network path, route or circuit may involve several nodes that are not managed by the same NMS. Therefore, a network administrator performing such troubleshooting must have access to all of these network management systems, which is often not convenient, efficient or practical.
FIG. 2 (prior art) shows an alternate approach for implementing out-of-band diagnostics. This approach relies upon reservation of a portion of the network bandwidth for control data. A shortcoming of this approach is that it does not test the actual datapath between the Provider Edge nodes (PE).
An in-band OAM approach for network diagnostic, which relies on OAM fault management messages, sends OAM fault messages on the same channels as user data without reservation of a separate control channel. As shown in Table 1 (prior art), each network layer has its in-band fault management mechanism (e.g., set of OAM messages), which normally follows the same formal diagnostic protocol (e.g., pro-active detection, on-demand verification, on-demand localization, etc). Connectivity is tested in this diagnostic protocol by pro-actively sending repeatedly a stream of periodical Hello messages. Nodes expect to receive Hello messages from their neighbors. An absence of reception of these messages is interpreted as loss of connectivity. In Ethernet context, these Hello messages are referred to as Connectivity Checking messages. A layer may include its own mechanism for issuing Hello Messages or may use a separate mechanism such as, for example, Bi-directional Forwarding Detection (BFD). To verify that a problem reported by a Hello mechanism is tested in this diagnostic protocol, a network operator uses an on-demand ping-like tool to perform such verification. In Ethernet context, such an on-demand ping-like tool is referred to as LoopBack. In PW (Pseudo Wire) context, such an on-demand ping-like tool is referred to as VCCV (Virtual Circuit Connection Verification). A ping message is sent along a datapath, and a reply is sent along the return datapath. If a problem has been confirmed by a Ping mechanism, the network operator attempts to locate the exact node or link that has a problem using a Traceroute-like mechanism (called LinkTrace in Ethernet context).
TABLE 1Existing in-band fault management mechanismsConnectivityPing(Detection)(Verifi-Traceroutedata pathcontrol pathcation)(Localization)IPBFDIGP/BGPICMP PingICMPHelloTracerouteEth last802.3ah——mileEth802.1ag/Y.1731 CCLoopBackLinkTraceProviderVPLS——(VPLS ping/TR, Eth level,.1ag-like, proprietaryimplementations (MACping))PWBFD, Y.1711LDP HelloVCCV—MPLSBFD, Y.1711,LDP/RSVPLSP PingLSPtunnelY.1713HelloTraceroutedata link(liveliness, keepalive,——local managementinterfaces, . . . )
These known fault management mechanisms are efficient for their task of reporting and locating a connectivity problem, even along paths and nodes not managed by a single NMS. But, they do not help finding the existence and root cause of other problems. They can be extended for other uses such as, for example, piggybacking timestamps for delay measurements. Also, they are normally used in a point-to-point scenario, except for Ethernet CC, which is natively broadcast. As shown in FIG. 3 (prior art), a network operator usually has to launch a succession of point-to-point tests to test multipoint data paths between the Provider Edge nodes (PE).
FIG. 4 (prior art) shows a modified approach for implementing OAM and on-demand diagnostics. This approach includes piggy-backing other requests and replies on a Ping mechanism, and forces a parallel broadcast instead of a sequential series of point-to-point queries between the provider edge nodes (PE). With respect to piggy-backing, most Ping mechanisms allow for such functionality via extensions (e.g., TLVs: Type Length Value extensions). This modified approach is referred to herein as VPLS In-band Configuration Verification (VICV). VICV is an on-demand tool, and does not provide any desired pro-active diagnostics service.
As shown in FIG. 5 (prior art), multipoint in-band OAM such as VICV can be made recursive so that a Provider Edge node (PE) or a Multi-Tenant Unit (MTU) propagates the piggy-backed requests as long as it is not an edge toward the customers. In this manner, multipoint in-band OAM is compatible with Hierarchical VPLS (H-VPLS).
Discovery protocols in the IP layer are another approach for implementing network diagnostic. Examples of these discovery protocols include, but are not limited to, Link Layer Discovery Protocol 802.1ab (LLCP), Neighbor Discovery Protocol (NDP), or Border Gateway Protocol BGP Hello (BGP Hello). These discovery protocols are normally used for signalling and routing (e.g., recomputing and advertising forwarding tables, establishing correspondence between physical and logical addresses, etc). They are sometimes used indirectly for some configuration verification and diagnostics. However, by nature, they are limited to one-hop neighbors and do not perform all the configuration verifications (even locally) that are generally needed for a complete diagnostics solution.
Concepts for more sophisticated approaches for network management and diagnostics have been touched upon for some time. More specifically, sophisticated methods include correlation analysis whether deterministic (e.g., intelligent agents, expert systems, rule-based, etc) or probabilistic (heuristics, Bayesian inference, fuzzy logic, etc). The input data for such correlation analysis usually includes sets of MIB queries, regular active monitoring (i.e. connectivity) and/or active performance measurement (e.g., delay, delay variation, losses). Active monitoring of configuration has been proposed in other contexts (e.g., robotics, automated systems in factories, camera-based security systems, power networks, etc.), but not specifically in telecommunication network management. The problem with these more sophisticated approaches is that they focus on performance-related analysis and do not look into misconfiguration-related problems such as, for example, potential problems with a traffic class that has not crossed the network yet or with a path that has not been used yet, suboptimal forwarding of current traffic, etc. When they rely on proactive measurement, it is limited to performance metrics and the sampling does not consider the constraint of complete coverage whereby it would make sure that eventually all the nodes, paths, and criterions were measured and tested.
Therefore, network diagnostics functionality that is useful for problem solving and verification of operational networks and that overcomes shortcomings associated with conventional approaches for facilitating network management and diagnostics would be advantageous, desirable and useful.