1. Technical Field
The present invention relates in general to network management systems, and more particularly to a system and method which enable a user to define the process flow of a policy that defines management behavior for managing a communication network.
2. Background
The information-communication industry is an essential element of today's society, which is relied upon heavily by most companies, businesses, agencies, educational institutions, and other entities, including individuals. As a result, information service providers such as telephone, cable, and wireless carriers, Internet Service Providers (ISPs) and utility companies all have the need to deploy effective systems suitable for servicing such a demand. Accordingly, network management and operations have become crucial to the competitiveness of communication companies, utilities, banks and other companies operating Wide Area Networks (WANs) of computer devices and/or other network types and devices, including SONET, Wireline, Mobile, Internet Protocol (IP) devices, etcetera. For instance, many companies currently use customized “legacy” network management systems (NMSs) and operations support systems (OSSs). Various implementations of NMSs/OSSs are available in the prior art for managing networks and network elements.
Thus, management systems (“MSs,” which encompass both NMSs and OSSs) have been implemented in the prior art for managing communication networks and network elements. Given that it is often desirable to manage various network elements (e.g., various types of devices, including without limitation routers, switches, computer equipment, etcetera), various types of management systems have been developed for managing such elements.
One area of management involves fault management. Fault alarm incidents (or messages) are routinely generated for the various components of a network to allow the service provider (or system administrator) to monitor the operational state of the network. Fault management systems generally receive and process these alarm incidents in accordance with fault management objectives as defined by the service provider.
Customers often desire to configure the management system in a particular manner for managing their network and/or network elements. That is, customers often desire to configure the management system to implement a desired management behavior. For example, a customer may desire to configure the management system to generate an alert to an alert display in the event that a particular alarm is detected for a certain network element. As other examples, a customer may desire to configure the management system to implement such behaviors as alert suppression, correlation, thresholding, logging, and other management behaviors, as are well known in the art.
Traditionally, configuring the management system to implement a desired management behavior, such as a desired alert generation, required development of software code that is executable to perform the desired management behavior. Such software code may, for example, be written in a programming language, such as C, C++, Pascal, BASIC, or other programming language known in the art. Because the customer generally does not have access to the source code of the management system, the customer may be required to develop independent code that is capable of interacting with the management system to implement the desired management behavior, or (more typically) request that the provider of the management system develop such code that implements the desired management behavior into the management system.
More recently, management systems have been developed that enable a customer limited ability to configure management behavior thereon. More specifically, management systems have been developed that include an interface program with which a customer may interact to configure, at least to a limited extent, the management behavior of the management system. For example, an interface program may be included that enables a user to input rules that are to govern the behavior of the management system. Such rules may, for example, be written by the user in the form of relatively simple “IF THEN” statements. The rules may be input by the user to govern such management behavior as alert generation, correlation, suppression, thresholding, and logging, as examples. Once developed by the user, the MS may then execute such rules to manage the network elements in the desired manner. For instance, events detected for various network elements may be correlated in some manner (as may be specified by a user-defined rule) to enable the MS to perform a desired behavior (or task) upon detecting the specified correlation of events. Also, alarms relating to certain events may be suppressed (as defined by a user-defined rule) as such events may be residual events resulting from another event that has already been reported by the MS to the system administrator.
A threshold number may be specified for certain events (within a user-defined rule) to avoid generating alerts for events that are not actually indicative of a problem. For instance, a process that is suppose to be running within the network may be polled periodically by the MS to ensure that it is operational and responsive. Upon initially being polled, the process may be too busy to immediately respond to the poll. Accordingly, the non-responsiveness of the process may not be indicative of a situation for which an alert should be generated, but instead may only be the result of the process being busy with other tasks at the time it was polled. Thus, for example, a threshold may be defined to specify that an alert is to be generated only if the process fails to respond to three consecutive polls in order to avoid unnecessary generation of alerts.
As another example, a user-defined rule may specify that an alert is to be generated having a non-critical severity when a first set of conditions are encountered and such rule may further specify that the alert is to have its severity escalated to indicate critical severity upon a second set of conditions being encountered. For instance, 75% CPU utilization rate on a particular network element may, according to a user-defined rule, generate an alert of relatively minor severity, but upon the network element's CPU utilization rate increasing to 95% or greater, the rule may specify that the alert is to be escalated to critical severity.
As yet another example of management behavior that may be defined by a rule, event logging may be performed. That is, events detected by the MS for network elements may be logged to a file (e.g., to a database or other data structure for storing data). Those of ordinary skill in the art will recognize other management tasks in addition to the exemplary tasks described briefly above that may be defined in rules implemented on the MS to control the management of network elements by the MS. That is, user-defined rules may be implemented to configure the management behavior of the MS in various ways.