1. Technical Field
The present invention relates in general to network management systems, and more particularly to a system and method which enable distribution of policies that define management behavior for managing a communication network.
2. Background
The information-communication industry is an essential element of today's society, which is relied upon heavily by most companies, businesses, agencies, educational institutions, and other entities, including individuals. As a result, information service providers such as telephone, cable, and wireless carriers, Internet Service Providers (ISPs) and utility companies all have the need to deploy effective systems suitable for servicing such a demand. Accordingly, network management and operations have become crucial to the competitiveness of communication companies, utilities, banks and other companies operating Wide Area Networks (WANs) of computer devices and/or other network types and devices, including SONET, Wireline, Mobile, Internet Protocol (IP) devices, etceteras. For instance, many companies currently use customized “legacy” network management systems (NMSs) and operations support systems (OSSs). Various implementations of NMSs/OSSs are available in the prior art for managing networks and network elements.
Thus, management systems (“MSs,” which encompass both NMSs and OSSs) have been implemented in the prior art for managing communication networks and network elements. Given that it is often desirable to manage various network elements (e.g., various types of devices, including without limitation routers, switches, computer equipment, etcetera), various types of management systems have been developed for managing such elements.
One area of management involves fault management. Fault alarm incidents (or messages) are routinely generated for the various components of a network to allow the service provider (or system administrator) to monitor the operational state of the network. Fault management systems generally receive and process these alarm incidents in accordance with fault management objectives as defined by the service provider.
Traditionally, configuring the management system to implement a desired management behavior, such as a desired alert generation, required development of software code that is executable to perform the desired management behavior. Such software code may, for example, be written in a programming language, such as C, C++, Pascal, BASIC, or other programming language known in the art. Because the customer generally does not have access to the source code of the management system, the customer may be required to develop independent code that is capable of interacting with the management system to implement the desired management behavior, or (more typically) request that the provider of the management system develop such code that implements the desired management behavior into the management system.
More recently, management systems have been developed that enable a customer limited ability to configure management behavior thereon. More specifically, management systems have been developed that include an interface program with which a customer may interact to configure, at least to a limited extent, the management behavior of the management system. For example, an interface program may be included that enables a user to input rules that are to govern the behavior of the management system. Such rules may, for example, be written by the user in the form of relatively simple “IF THEN” statements. The rules may be input by the user to govern such management behavior as alert generation, correlation, suppression, thresholding, and logging, as examples. Once developed by the user, the MS may then execute such rules to manage the network elements in the desired manner. For instance, events detected for various network elements may be correlated in some manner (as may be specified by a user-defined rule) to enable the MS to perform a desired behavior (or task) upon detecting the specified correlation of events. Also, alarms relating to certain events may be suppressed (as defined by a user-defined rule) as such events may be residual events resulting from another event that has already been reported by the MS to the system administrator.
A threshold number may be specified for certain events (within a user-defined rule) to avoid generating alerts for events that are not actually indicative of a problem. For instance, a process that is suppose to be running within the network may be polled periodically by the MS to ensure that it is operational and responsive. Upon initially being polled, the process may be too busy to immediately respond to the poll. Accordingly, the non-responsiveness of the process may not be indicative of a situation for which an alert should be generated, but instead may only be the result of the process being busy with other tasks at the time it was polled. Thus, for example, a threshold may be defined to specify that an alert is to be generated only if the process fails to respond to three consecutive polls in order to avoid unnecessary generation of alerts.
As another example, a user-defined rule may specify that an alert is to be generated having a non-critical severity when a first set of conditions are encountered and such rule may further specify that the alert is to have its severity escalated to indicate critical severity upon a second set of conditions being encountered. For instance, 75% CPU utilization rate on a particular network element may, according to a user-defined rule, generate an alert of relatively minor severity, but upon the network element's CPU utilization rate increasing to 95% or greater, the rule may specify that the alert is to be escalated to critical severity.
As yet another example of management behavior that may be defined by a rule, event logging may be performed. That is, events detected by the MS for network elements may be logged to a file (e.g., to a database or other data structure for storing data). Those of ordinary skill in the art will recognize other management tasks in addition to the exemplary tasks described briefly above that may be defined in rules implemented on the MS to control the management of network elements by the MS. That is, user-defined rules may be implemented to configure the management behavior of the MS in various ways.
In existing MSs, all management processing is generally performed at a central processing system that executes all management tasks. In the event the central processing system is required to process large number of management tasks, this central processing system may experience strain in terms of communications throughput, memory and processing performance. For example, a number of network elements, all transmitting messages to the central processing system, are capable of easily overloading the resources available to the central processing system. Without the ability to process management tasks in a timely manner, far more serious network problems may occur. Likewise, such a situation impacts the user's ability to communicate with the system, eventually causing undue frustration.
In particular MS implementations, the upgrade and replacement of memory and central processing unit components is employed to alleviate such performance issues. Furthermore, upgrades to communications systems are also utilized in order to improve communication system performance. These proposed solutions are both costly and inflexible. For example, replacement of computing peripherals forces the disposal of a currently operational component for a newer, more expensive component that is capable of providing the memory or processing power needed by the central processing system in order to function at the capacity desired. Additionally, most central processing systems possess limitations on the amount of memory and processing power they are capable of supporting, thus leading to the eventual prevention of future upgrades. These limitations may arise due to usage of all memory sockets on the central processing system, insufficient system bus speeds and the fact that current state of the art processing power may be inadequate. Accordingly, a desire exists for a system and method that improve performance of a MS. More specifically, a desire exists for a system and method that alleviate some of the processing strain on the central processing system in performing management tasks for managing elements of a communication network.