Generally speaking, detection and isolation of connectivity loss is much more difficult to identify than is detection of transmission protocol data units (PDU) corruption. This is because a corrupted PDU is typically available for inspection and analysis, while detection of datapath interruption requires analysis of the PDU stream, rather than individual PDU's.
Thus, most transmission protocols associate a CRC (Cyclic Redundancy check) to each PDU, which is computed by applying a predetermined function to a block of data to be transmitted. The receiver at the far-end of the datapath recalculates the CRC using the same function as at transmission and compares the transmitted and received CRC. The corrupted bits are detected and may then be corrected using the CRC bits.
It is known to use OAM-CC cells to monitor for an end-to-end datapath connectivity in ATM networks. An OAM (operation, administration and maintenance) cell is a specially tagged ATM cell specified to support ATM network maintenance features like connectivity verification, alarm surveillance, continuity check, and performance monitoring. However, OAM-CC is not supported by all network equipment makes (network equipment supliers). In addition, the operational impact of configuring and monitoring large networks with thousands of connections becomes significant. Also, although OAM-CC can detect a datapath issue, it cannot isolate the cause (node, card) and therefore fails to reduce the fault isolation time and any related revenue loss issues. For these reasons, this solution is declined by many network customers.
Another conventional solution for datapath fault detection includes measuring the traffic at each node along the datapath using customized, off-line testers. These testers generally provide counters at ingress and egress ends of the datapath for counting the number of PDUs traversing these ends. The values of the counters over a preset period of time are compared to determine if cell loss or cell addition has occurred at the respective node. However, since this tester is a stand-alone device, the traffic needs to be stopped during the measurement, thus adversely affecting subscriber services. Also, these measurements take place after an end-user complains to the service provider about a failure. These limitations make this type of conventional solution essentially incompatible with real-time background diagnostic monitoring of a datapath.
On-line counters may also be used, as described in the co-pending U.S. patent application Ser. No. 10/717,377, entitled “Method And Apparatus For Detection Of Transmission Unit Loss And/Or Replication”, Steven Driediger et al. filed on 19 Nov. 2003, and assigned to Alcatel. According to the solution proposed in Driediger's et al. Patent Application, aggregate per connection coherent counters are kept on ingress and egress line cards and periodically compared. This mechanism requires bounded latency through the switch fabric but subtle traffic loss or replication is also detected. On the other hand, this method adds complexity to PDU latency measurements as the PDU's traverse the fabric. This is because it is difficult to accurately measure and track the PDU's since latency is continuously changing.
There is a need to provide a method and apparatus that enables fast datapath failure detection while leveraging the hardware infrastructure that most nodes already have.