It is known that a key part of almost any communications network is a service management system. It typically carries out functions such as monitoring network performance, monitoring component performance, keeping track of versions of components, configuration of configurable components, recording and scheduling maintenance at remote sites, and reconfiguring the network to take account of changing traffic patterns or planned or unforeseen outages of components, including cards, racks, equipment bays, or fibers or other transmission lines.
Such service management functions can in principle be centralized or distributed amongst the nodes of the network. An example is shown in US patent Application 20020019866 A1 (Linzy), Feb. 14, 2002 entitled “System and method for monitoring and maintaining a communication network”, which shows monitoring and maintaining a communication network and the capability to determining connection and configuration parameters of network elements, and monitoring performance characteristics of network elements to recognize faults within the communication network. Another typical configuration of service management functions for a long haul optical network is shown in FIG. 1. It is essentially centralized at a remote location, communicating with each of the components at the nodes of the network via management communication channels. These typically form a management network, for example using Ethernet or other well known network layer protocols, using physical paths such as leased lines of the public telephone network, or dedicated wavelengths on the fibers used for the traffic between nodes of the network.
The service management functions typically include a network management system NMS, a number of element management systems, EMS, coupled to the NMS. There is also an operational support system, OSS, coupled to the NMS. A billing system is also shown, coupled to the NMS via the OSS. The main function of the EMS is to perform detailed monitoring and management functions upon the limited subset of equipment types that it understands. The main function of the OSS is to maintain operational and non-operational records (e.g. Service Level Agreements [SLAs]) regarding services. The main function of the NMS is to provide an integrated view of network and service operation, as well as overall control functions.
The coupling to the billing system enables the billing system to receive information from the network, for automated billing according to criteria set out in contracts called service level agreements SLA. These criteria can include for example quantities of traffic transmitted, or penalties for lost traffic, or penalties for lack of available transmission capacity for more than a given period and so on.
The penalties for lost traffic or lack of capacity to meet an agreed SLA can be very significant. Normally the network management system enables operators to determine how much capacity is available and ensure that proposed SLAs can be met before they are agreed. To allow for unforeseen outages caused by equipment failures, redundant capacity is provided which can be switched in automatically.
One of the more significant sources of heavy penalties for breaches of SLAs is disruptions caused by incorrect actions taken by operators of the network management system or by craftspeople working on the hardware at remote locations, under the direction of the operators.
A crafts person accidentally disconnecting the wrong line-card in a transport system can cause vast financial penalties and/or extra work to the operator, due to the disruption to paying traffic violating service level agreement penalty clauses. Even if the traffic is automatically re-routed onto a protection path, or a shared protection path, major disruption can still be caused by another failure or unavoidable maintenance activity on the now un- or less-protected traffic. More disturbingly, an operator of a remote network control/management system can accidentally send configuration commands with a similarly damaging effect by accident and/or misunderstanding the display in front of them. This is particularly unfortunate, as nobody but the management operators is checking for service interruptions. These problems or risks are particularly acute where the network includes high capacity links where many hundreds or thousands of connections are aggregated or multiplexed over a single fiber, but can occur in any network having some degree of complexity.
Efforts to reduce these risks have included providing warning messages on screen for operators to say “are you sure” before a reconfiguration action is carried out—but these give no more information to help reconsider the decision and thus are often clicked without further thought. For craftspeople, warning indications have been shown on each card or rack. For example red LEDs on the front of failed circuit packs to show that they are out of service and thus can be pulled without fear of interrupting service are a very effective standard industry technique. However, this does not cover the plethora of circumstances where this LED doesn't show but the card is still ready to be pulled: where the wrong sort of card has been plugged in, where the card is misconfigured, where all (or most of) the traffic has been re-routed to not pass through the card, where the card has failed to detect its own failure—perhaps it is performing but only marginally etc.
Even the red LED technique can't help in situations where maintenance or reconfiguration or provisioning is being performed remotely. While it is to be hoped that network management software would be aware of the presence or absence of the fault, craft terminals or other applications not directly integrated with the NMS may not display this to the operator and thus still reconfigure the node/pack in error. Even where the indications are accurately displayed, their significance may not be clear to the operator, as described in the previous paragraph. Yellow ‘warning’ LEDs have been fitted in an attempt to provide more helpful information, however there is substantial concern that these do not provide actionable information—i.e. “you can pull this card now”.
In co-pending U.S. application Ser. No. 10/109,949 (Nortel Networks ref 15027ID) filed Mar. 29, 2002, there is shown a method for determining the impact on a network of a fault, involving constructing a layered topological model of the network according to a predetermined protocol, receiving an indication of the fault, and modelling the impact on the network and its services resulting from the event. The modelling step may include applying a priority weighting to the fault to determine a priority order in which to attend to a sequence of faults. The priority order can be based on the cost effectiveness of rectifying the faults.