A communications network often includes different elements, such as servers, routers, switches and various other elements applicable for both data and voice networks, which perform various functions in the network. In addition, there would be elements not directly related to the communication function of the network may be also connected and available to perform a variety of tasks. These elements often have different management and control interfaces, and use different protocols to communicate.
Network management involves managing and monitoring of network elements. The management of the network elements is done by a system referred as network management system (NMS). The NMS interacts with an agent module running on the respective element for managing them. The NMS interacts with the agent module through a defined set of interfaces, protocols and operations. An example of such interface is Simple Network Management Protocol (SNMP). The SNMP enables retrieving of various important parameters, attributes that are stored in the elements. These parameters, attributes vary at various instance of time based on the operating environment and these variables are referred as “Managed Object”. As per SNMP definition, a collection of such managed objects is referred as Management information Base (MIB). The following request-response operations on the managed objects are supported as a part of the SNMP Framework.                GET        GET BULK        GET NEXT        SET        
The SNMP Framework also supports notification mechanism through the following operations.                TRAP        INFORM        
The GET operations enable fetching of data/value of the managed object. The TRAPS and INFORM protocol data unit (PDU) supports notification mechanism. The TRAP is an asynchronous notification without acknowledgement. However the INFORM based mechanism is an ACK based notification enabled as a part of SNMP V2. The SET operation enables to set a value of the managed object which is typically used for configuring/performing a command instructed by the SNMP manager.
With the proliferation of the network devices, it is envisaged that the number of devices to be managed are growing tremendously. The business service working in such a network environment is typically realized based on a set of functionality that is orchestrated across various systems and platforms in the network. Typically network management has been more focused on the monitoring of elements and the significance of recovery actions for business services in the case of problems/faults has been dealt in a manual manner wherein an administrator tends to login manually across multiple systems and perform the action recovery sequence. Since, the functioning of the business services is most important aspect for a provider offering the service, it is more appropriate to have an automatic/programmatic approach to the recovery of business services as opposed to the common practice of employing manual methods.
Traditionally SNMP has been leveraged largely for network monitoring and more importantly the GET operations are typically used to get the data with TRAPS being used for asynchronous notifications.
The SET operation has been typically used to perform configuration changes and set value of the managed object. Actions resulting out of TRAPS were invariably performed outside the SNMP based elements OR in some cases by defining the OID (Object Identifier) as a part of MIB definition. While this approach has been in practice, there is a fundamental challenge in this approach with respect to taking recovery actions for business services.
In this regard it may be noted that the prior art solutions based on SNMP does not inherently support/have an ability to handle a managed transaction across multiple elements required for performing recovery actions.
As various types of network elements are brought under the NMS, the set of recovery actions required for restoring the business service or fault spans across multiple platforms, systems and devices. Essentially, restoration involves performing multiple set of recovery actions within/across multiple network devices. Besides, the actions could result in intermediate responses from the devices and hence the mechanism to change a course of action sequence in a dynamic manner in a programmatic way by the Management System is important. Accordingly, there exists a need for providing a system and method which is capable of being instructing the recovery actions and correlate outcome of such recovery action responses performed by various agent modules running in the respective platforms/systems/devices.