Network management systems use information indicating a probable cause of a network event in performance monitoring and in operation and maintenance of telecommunications networks. When a device in a network detects an event (such as a network element failure) it notifies the network management system by sending an alarm indication message. Information identifying the event is included in the alarm indication message in a field known as the “probable cause” field. The probable cause field is important because it enables a network operator to begin the process of diagnosis in order to fix any underlying problem. The alarm indication message also contains other useful fields such as the object instance (which describes the precise entity where the condition was detected); a timestamp, a severity indication, and so on.
Use of a standardised list of probable cause codes is known. For example, a list of probable cause codes is defined by the ITU-T in CCITT Recommendations M.3100 (1995) Generic Network Information Model; M.3100 Amendment 2 (1999): 1999; X.721 (1992) ISO/IEC 10165-2: Structure of management information: Definition of management information; and X.733 (1992) ISO/IEC 10164-4: Systems Management: Alarm reporting function. Other standards bodies, such as IETF, GSM and 3GPP have also defined standard probable cause codes.
The list of probable cause codes defined in the above standards specifications are either numeric (for example, M.3100 code ‘12’ which indicates excessive bit error rate) or textual (for example ‘ExcessiveBER’). Such codes offer a very concise representation of a probable cause and were conceived at a time when bandwidth and processing power were limited to avoid using a significant proportion of the available bandwidth and processing capacity for performance monitoring and operations and maintenance. Note that a single network event will typically result in a large number of alarms being raised by various devices which are affected by the event around the network. This is known as alarm flooding. Because of alarm flooding, a single event can trigger a large volume of alarm signalling to network management systems taking up a corresponding large proportion of bandwidth and processing capacity.
It is highly valuable to have a standardised set of probable cause codes for interoperability of equipment and software from multiple vendors. With the gradual convergence of different network technologies, for example wireline, wireless and optical networks, this becomes even more important.
However, technological advancement in telecommunications systems, equipment, protocols and software gives rise to an ever increasing and changing set of possible network events that may occur. It is desirable to be able to report these events in a meaningful way to network management systems for performance monitoring and operations and maintenance. Unfortunately, this objective is incompatible with the need to maintain a standardised set of probable cause codes because the procedures of standards bodies are simply unable to keep up with the rapid rate of technological advancement. Thus, in the past, relatively infrequent amendments to standards specifications have been made which typically include dramatic extensions to the list of probable cause codes.
One problem with the above is that, prior to the inclusion of new probable cause codes, vendors have tended to map new network events that may occur to existing probable cause codes in an imprecise or inaccurate manner. For example, the network event of the synchronisation status of a node being unstable might be mapped to “timingProblem” or “synchronizationSourceMismatch”. “SynchronisationSourceMismatch” is not an accurate mapping, whereas “timingProblem” is very vague. Either way, this results in a loss of valuable information that might otherwise be reported to network management systems.
This loss of information also results in problems when it comes to clearing previously set alarms because the imprecise or inaccurate mapping results in a many-to-one, one-to-many or even a many-to-many mapping between network events that trigger the raising and setting of an alarm and network events that trigger the clearing of an alarm.
Another problem is that legacy network management applications or equipment that were developed before a new probable cause code is introduced is unable to understand and process an alarm indication message having that probable cause code and thus evolution or replacement of transport network equipment or software often requires a radical overhaul of network management systems as well.