The present invention relates generally to the field of network fault management, and particularly relates to a method and a system of SNMP (Simple Network Management Protocol) based management of active alarms in a network environment.
Conventional network space includes a layered architecture of a network transport fabric comprising Network Elements (NE) for end-to-end transport of payload data across the network, and a network management layer for controlling operation of the NEs and for providing network administrative services.
A typical network management model includes: management stations, management information bases (MIB), management agents and a management protocol.
Simple Network Management Protocol (SNMP) is a common method by which network management applications can query a management agent using a supported MIB. SNMP supports the exchange of network information between hosts, typically including one or more centralized network management consoles that manage larger numbers of network elements in real-time. SNMP operates over UDP (User Datagram Protocol) at the Open Systems Interconnection (OSI) application layer.
Although SNMP was originally designed as the TCP""s stack network management protocol, it can now manage virtually any network type and has been extended to include non-TCP deployed devices. SNMP is widely deployed in TCP/IP (Transmission Control Protocol/Internet Protocol) networks, but actual transport independence means it is not limited to TCP/IP. In particular, SNMP has been implemented over Ethernet and OSI transports.
A management information base (MIB) is a database of configuration, status and statistics that is stored on a network agent for access by a Network Management Station (NMS) and/or an Element Management System (EMS). An MIB consists of a repository of characteristics and parameters managed in a network element (or managed resource) such as a NIC, hub, switch, or router. Each managed resource knows how to respond to standard queries issued by network management protocols. Within the Internet MIB employed for SNMP based management, ASN.1 (Abstract Syntax Notation One) is used to describe network management variables. These variables, which include such information as error counts or on/off status of a device, are assigned a place on a tree data structure.
When a distributed management system (e.g. EMS, NMS, etc.) first learns about an SNMP-managed resource (e.g. NE), it has no way to determine what alarms (i.e. traps, abnormal conditions, interesting conditions relating to the NEs) are currently active in the system. Also, if the management system loses communication with the NE or EMS, it cannot tell if any alarms were sent out during this communications blackout. In order to provide reliable fault management, it is necessary to determine the current status of a managed resource when first encountered or after loss of communication with a managed resource.
An alarm is a kind of object that represents an abnormal condition or a condition of interest of a managed resource. An alarm is active as long as the corresponding abnormal or interesting condition remains.
Solutions have been proposed that involve the development of active alarm tables that are specific to a particular set of notifications. Such a system is described in a co-pending U.S. patent application Ser. No. 09/444,344 filed on Nov. 19, 1999 titled Carrier-Grade SNMP Interface for Fault Monitoring assigned to the same assignee as the present application. The prior art solutions do not support existing standard and proprietary notifications and would require an NE to redefine its internal notification list to obtain active alarm functionality.
There is need for a solution where active alarm tables can be maintained that can support any alarm/trap from a plurality of managed resources (e.g. NE, EMS, etc.) regardless of native format. Further, alarms should be capable of being removed from the active alarm table when a clear alarm notification is generated by the NE or after a prescribed time-out period.
The present invention provides a table, associated with a managed resource (e.g. NE, EMS, NMS etc.) that can be maintained in a respective information store (e.g. MIB), for maintaining a list of active alarms of the managed resource in a generic format dictated by the managed resource. In particular, the table associated with an NE maintains a list of alarm notification information for its own NE; the table associated with a management system (e.g. EMS, NMS) maintains a list of alarm notification information for a group of NEs within its domain.
The present invention is also directed to a method and apparatus for maintaining a list of active managed resource alarms within a network. The list is preferable maintained in an active alarm table associated with each managed resource (for example in an NE-MIB) and with at least one management component (for example an EMS or NMS). The alarms are removed from the tables either after a clear notification is received from the managed resource or after a prescribed age-out or time-out period. The active alarm table supports alarms in the generic or native format of the NE. A xe2x80x9cgenericxe2x80x9d or xe2x80x9cnativexe2x80x9d format is a format that the managed resources are currently using for their standard and proprietary alarms.
In accordance with one aspect of the present invention there is provided an active alarm table associated with a management information base of a managed resource having a set of defined alarms in an SNMP (Simple Network Management Protocol) based network. The active alarm table includes a list of alarm notification information in the native format of the managed resource. The managed resource advises the management information base of the existence, occurrence and removal of an alarm on the managed resource such that the list of alarm notification information for the managed resource provides a listing of all active alarms for the managed resource.
In accordance with another aspect of the present invention there is provided a management information base associated with a management system in an SNMP (Simple Network Management Protocol) based network having a plurality of managed resources, each one of the managed resources includes a set of alarms in a native format. The management information base includes an active alarm table for maintaining alarm notification information in the native format of the plurality of managed resources. Each one of the of managed resources advises the management system of the existence, occurrence and removal of an alarm on a respective managed resource such that a list of active alarms for the managed resources in the network are provided in the active alarm table of the management information base.
In accordance with one aspect of the present invention there is provided a method of SNMP (Simple Network Management Protocol) based fault management in a network having a plurality of managed resources monitored by a management system. Each one of the managed resources has a set of defined alarms and a first active alarm table. The method includes the following steps: maintaining the first active alarm table of each of the managed resources in response to an occurrence and removal of an alarm from the set of defined alarms; and advising the management system of the occurrence and removal of the alarm.
In accordance with another aspect of the present invention there is provided a method of SNMP (Simple Network Management Protocol) based fault management in a network having a plurality of managed resources monitored by a management system. Each one of the managed resources includes a set of defined alarms and a first active alarm table. The method includes the following steps: updating the first active alarm table of a target managed resource with alarm notification information in response to an occurrence of an alarm from the set of defined alarms in the target managed resource, the target managed resource being one of the plurality of managed resources; advising the management system of the alarm with the alarm notification information; removing the alarm notification information from the first active alarm table of the target managed resource in response to a return to normal notification received from the target managed resource; and advising the management system of the return to normal notification.
In accordance with another aspect of the present invention there is provided a system for enabling SNMP (Simple Network Management Protocol) based fault management in a network having a plurality of managed resources each having a set of defined alarms and a first active alarm table. The managed resources are monitored by a management system that includes a second active alarm table. The system includes the following components: a resource manager for maintaining the first active alarm table of each of the managed resources and for advising the management system in response to an occurrence and removal of an alarm from the set of defined alarms; and (b) a system manager for maintaining the second active alarm table in response to advisement of the occurrence and removal of the alarm from the resource manager.
In an exemplary aspect of the present invention provides for consumption of the active alarm tables. For example, when an EMS discovers or regains connectivity to an NE, the EMS will poll the active alarm table of the respective NE to update its own active alarm table. It may further update other management components in the network (such as an NMS).
Other aspects and features of the present invention will become apparent to those ordinarily skilled in the art upon review of the following description of specific embodiments of the invention in conjunction with the accompanying figures.