The present invention relates to fault reporting in data communication networks, and more particularly to scalable and accurate methods and systems for suppression of alarms in Ethernet networks.
Connectivity fault management (CFM) refers to the ability to monitor the health of a network. CFM in Ethernet networks has historically been performed by a network management system (NMS) running an application layer protocol such as Simple Network Management Protocol (SNMP). In a typical SNMP-based NMS, faults in an Ethernet network are detected by SNMP agents running on managed data communication nodes (e.g. nodes supporting Ethernet bridging capability) and reported to a central SNMP manager. Fault reports are typically made in response to individual polling of the SNMP agents, which can be cumbersome and slow in networks with large numbers of managed nodes. While SNMP agents may be configured to make unsolicited fault reports, for example, to send a fault notification to the SNMP manager in direct response to fault detection, they cannot send fault notifications if their managed nodes have experienced a catastrophic failure.
The inadequacy of SNMP-based NMS alone to deliver CFM in large Ethernet networks that often span a multiple of customer, provider and operator networks has led to development of a native Ethernet CFM solution that is more robust. This native Ethernet CFM solution, which is being standardized in a document styled IEEE 802.1ag and is hereinafter called “Ethernet CFM,” provides proactive fault detection and reporting for bridged Ethernet networks through in-band transport of Ethernet management frames.
Operation of Ethernet CFM is shown by way of example in FIG. 1. A bridged Ethernet network 100 includes customer equipment CE1, CE2, CE3 in a customer network and provider equipment PE1, PE2 in a provider network. The customer and provider equipment include Ethernet bridging capability. Provider equipment PE1, PE2 is maintained by a service provider. Customer equipment CE1, CE2, CE3 is maintained by a customer of the service provider and CE1 communicates with CE2 and CE3 through provider equipment PE1 and PE2. The customer network further includes a customer network management system (CE NMS) 110 for monitoring faults in the customer network, while the provider network includes a provider network management system (PE NMS) 120 for monitoring faults in the provider network.
Maintenance associations (MA) are configured at different maintenance levels for performing CFM. In the example shown, a customer maintenance association (CMA) 115 is configured at a customer level to perform CFM in the customer network. CMA 115 includes maintenance endpoints (MEP) A, D, E and maintenance intermediate points (MIP) B, C. A provider maintenance association (PMA) 125 is configured at a provider level to perform CFM in the provider network. PMA 125 includes MEP F, G. MEP and MIP are software or hardware entities created on either a per-node or per-port basis. Generally speaking, MEP transmit and receive Ethernet management frames in their respective MA to detect faults which are selectively reported to an NMS so that corrective action can be taken. When a MEP detects a fault and reports the fault to NMS the MEP is said to raise an alarm. When a MEP detects a fault but does not report the fault to NMS the MEP is said to suppress an alarm.
MEP infer faults from loss of continuity with other MEP. In the example shown in FIG. 1, MEP G detects a fault on PMA 125 as a result of failing to receive a continuity check (CC) frame from MEP F. A CC frame is generally speaking a heartbeat message transmitted between MEP in a MA to confirm connectivity with the sending MEP. Detection of the fault causes MEP G to transmit a fault notification via SNMP to PE NMS 120 reporting the fault. Detection of the fault also causes MEP G to transmit an alarm indication signal (AIS) frame in CMA 115 to notify MEP D and MEP E of a lower level fault and thereby cause suppression of an alarm in CMA 115. Were the AIS frame not transmitted in CMA 115, MEP D and MEP E would detect the same fault through failure to receive a CC frame from MEP A and would report the fault to CE NMS 110 even though CE NMS 110 has no operational control over the provider network where the fault exists. Since the AIS frame is transmitted in CMA 115, MEP D and MEP E suppress the alarm in CMA 115 and refrain from making a superfluous report to CE NMS 110.
One problem with Ethernet CFM alarm suppression as generally described above is its accuracy when distinct faults are detected in MA operating at different levels, such as PMA 125 and CMA 115. Consider the situation where a lower level fault is detected in PMA 125 between MEP F and MEP G and a higher level fault is then detected in CMA 115 between MEP D and MEP E. When that occurs, an AIS frame transmitted by MEP G in CMA 115 should ideally inhibit reporting to CE NMS 110 of the lower level fault (over which CE NMS 110 has no operational control) but should not inhibit reporting to CE NMS 110 of the higher level fault (over which CE NMS 110 has operational control). One possible solution to this problem resides in providing to MEP D and MEP E a reachability relationship from which they can discern that higher level MEP A becomes unreachable as a result of a fault involving lower level MEP F. With knowledge of such a reachability relationship, MEP D and MEP E can suppress an alarm resulting from failure to receive CC frames from MEP A while raising an alarm resulting from failure to receive CC frames from one another.
One known implementation of Ethernet CFM alarm suppression, called nonselective AIS, fails to provide reachability relationships and therefore does not address the problem of distinct faults on multiple levels. Instead, all alarms are suppressed at a higher level in response to an AIS frame received from a lower level.
Other known implementations of Ethernet CFM alarm suppression, called selective AIS, do not scale well. In one selective alarm suppression implementation, a lower level MEP snoops CC frames transmitted by higher level MEP to learn which higher level MEP will become unreachable to other higher level MEP in the event of a fault involving the lower level MEP. The lower level MEP transmits a complete list of conditionally unreachable higher level MEP in a CC frame sent to another lower level MEP. In the event of a fault involving the lower level MEP, the lower level MEP that received the list transmits to the other higher level MEP in an AIS frame the complete list of conditionally unreachable higher level MEP so that other higher level MEP can suppress alarms resulting from failure to receive CC frames from the higher level MEP in the list. In networks with large numbers of MEP, this complete list of conditionally unreachable MEP transmitted in CC and AIS frames can have an extremely high bit count and cause such frames to violate the maximum transfer unit (MTU) size for Ethernet.