1. Field of Invention
This invention relates to the management of systems and more particularly to methods, apparatus, computer-readable media and a user interface for annunciating problems in a system.
2. Description of Related Art
Tools exist for the management of system problems, such as those encountered in telecommunications networks. These system management tools typically operate on a PC or UNIX workstation and enable the maintenance, surveillance and administration of multiple telecommunications network elements making up the system. Such tools provide for management of the network, that is, monitoring alarms, monitoring performance, managing connections and testing for faults.
An objective of existing system management tools is to provide a centralized view of a system so as to enable the operator to identify system problems from multiple events or conditions, such as alarms and performance degradations. For example, an initial root cause, such as an alarm, can often cause a cascade or flood of subsequent events through the system. Many events, such as alarms and performance degradations, can therefore be symptomatic of a single system problem. When there are many such events, it becomes difficult to determine which ones are correlated to a root cause system problem.
Some existing system management tools provide a GUI (graphical user interface) to assist the operator. One example is HP Open View Network Node Manager, provided by Hewlett-Packard Corporation of California, U.S.A. Such tools commonly represent a number of telecommunications network elements on a display in a topological configuration, but the display may be cluttered with iconic representations of a state for each network element. While such a display helps the operator to locate individual alarms or performance degradations in a system, it may not help the operator identify the relationships among these events and system problems, or root causes of problems.
Root-cause analysis tools have been developed for telecommunications networks and may correlate alarm events into problem sets, each set consisting of a direct detected alarm event and a correlated set of symptomatic alarm events. This automated correlation greatly reduces the amount of time the operator would have to spend in manually filtering the alarm events. Furthermore, such tools direct the operator's attention from dealing with individual events to dealing with overall problem sets. Some tools are capable of providing a brief probable cause description of the problem set and of providing a reference that can be used to help correct the problem set.
Most root-cause analysis tools are limited to use with certain types of alarm events. From a flood of different types of alarm events, they select one type of alarm and perform an exhaustive search for alarms of that type only. This allows many different types of alarm events to be treated as symptomatic of a single system problem.
Other tools allow an operator to examine service violations associated with an event. Often, an operator is responsible for maintaining intended service levels across the telecommunications network. These intended service levels could relate to agreements with customers, for example. There may also be penalties or costs associated with failure of the system to comply with the intended service levels described by clauses in a service level agreement (SLA), for example. Compliance of a particular telecommunications network element with a plurality of intended service levels may be crucial. Tools which provide this type of information allow the operator to examine intended service levels and observe service violations associated with a particular event or a particular telecommunications network element.
Generally existing system management tools help the operator to diagnose system problems and synthesize a great deal of information through a centralized view of the system, such as the telecommunications network described above. However, they leave a large amount of information to be synthesized by the operator, unaided. The operator may have to examine details of performance degradations to determine the system problems to which they relate. The operator may have to separately examine details of service violations to determine the system problems to which they relate, and to determine the relative importance of the system problems. The operator may use these determinations to prioritize the system problems and to schedule and plan maintenance and repair of the system. However, little is done by existing tools to summarize such details into problem priority information that could assist the operator in quickly identifying and prioritizing system problems. Consequently, there is a need for system management tools which provide a better description of system problems to permit an operator to better identify and prioritize system problems.