Due to the increasing complexity of modern information-technological (IT) networks (computer networks and telecommunication networks), integrated network management systems (also called “management platforms”) have become an important tool for the operation of IT networks. For example, Hewlett-Packard offers such a management platform under the name “hp OpenView” (see, for example, hp OpenView Quick Reference Guide, February 2003), and a structure of a typical management platform is, for example, shown in H.-G. Hegering et al.: Integrated Management of Networked Systems, 1998, pp. 313). Conventionally, one task of “network management” has been the monitoring of the status, performance or availability of network elements and the provision of monitoring results, for example, to a network operator. Typically, network monitoring includes an alarm functionality which alerts the operator when an incident occurs which requires attention, for example, a fault, an impending fault, or a decline of performance or availability of a network element. A network element's status message on which such an alarm is based may either be received from a managed network element in response to (e.g. periodical) requests from a management server, or it may be sent asynchronously by the managed network element to the management server. Typically (but not necessarily) a network management system not only provides a flow of information from managed network elements to the management server (and, e.g. to the network operator), but also enables managed network elements to be manipulated, e.g. by the network operator or, automatically, by a network management process. “Manipulating” may, for example, comprise configuring network elements or changing their existing configuration, starting and stopping processes in network elements, allocating memory to them, etc. Known management architectures are, for example, the OSI management architecture (which is mainly used in the Telecommunications Management Network) and the Internet management architecture which uses SNMP (Simple Network Management Protocol) as a communication model (see, for example, Hegering, pp. 121-196).
In simple network management systems, only hardware devices (such as network interconnect devices and end devices) are managed. However, what an operator of a network or client of a network provider is typically most interested in is not the functioning of the individual hardware devices and network connections, but rather the functioning of systems (i.e. combinations of several hardware components and an operating system used to utilize the hardware), applications and services provided by the network and its devices. Therefore, in more elaborate management systems, systems, applications and services are also managed (wherein “managed” often just means “monitored”). An application is typically an assembly of sub-applications on which it depends, which, in turn, depend on hardware resources (mostly on a multiplicity of hardware resources) and network connections. A service is a customer-based or user-based functionality which, for example, forms a part of a business process. It is typically composed of several application, operating system, hardware and communication components and normally has an interface to the service receiver. Hence a service depends on service-sub-functionalities, which, in turn, depend on hardware and communication resources (mostly on a multiplicity of such resources). Usually, an application (such as an enterprise resource planning (ERP) application, an e-mail server application, a web server application etc.) may be considered as a “service”, but the term “service” may also refer to higher-level objects which may, for example, be business-related (such as objects representing the different business fields of an enterprise). All these objects (hardware devices, systems, sub-applications/sub-services, applications/services) may be managed objects of a network management system; they are all denoted as “service elements” hereinafter.
Examples of network management systems in which service elements are managed (monitored) are described in US 2002/0138638 A1 and US 2002/0143788 A1.
Typically, there are dependencies between service elements: a fault of a hardware device will affect a sub-application (or a sub-service) which relies on this hardware device. In turn, a fault of a sub-application (or sub-service) will affect an application (or a service) relying on it. Such dependencies can be thought of as (virtual) links between service elements. Typically, these links are directed links since, for example, a hard-disk failure will affect a database application relying on this hard disk, but a failure of the (superordinate) database service will generally not affect the functionality of the (subordinate) hard disk. Similarly, the failure of a database management system (DBMS) (which may be considered as a “sub-application” here) will affect an ERP application (which may be considered as an “application”) relying on the DBMS, but the (subordinate) DBMS will generally not be affected by a failure of the (superordinate) ERP application. The service elements are thus in relationships which may be modeled by an element graph having links (or edges) between service elements. The nodes of the graph represent the service elements. The links represent the dependencies, and the direction of a link may represent the direction of dependency. Thereby, higher-level and lower-level service elements are defined (wherein, of course, there may be more than two levels, as indicated by the example given above). In simple cases, such a graph will be tree-like (also called a “hierarchy”), with the root of the tree at, or above, the highest-level service element or elements. An example of a tree-like service-element graph is shown in FIG. 4 of US 2002/0138638 A1. Generally, the element graph may have a non-tree-like structure (it may, for example, be a “lattice”), if more than one higher-level service depend on a lower-level service, i.e. it may have cycles. An example of such a non-tree-like service element graph is shown in FIG. 5 of US 2002/0138638 A1.
In network management systems, there are generally two different kinds of status (or monitoring) messages of monitored objects: (i) asynchronous messages which are typically sent by management agents assigned to the monitored object to a management server, such as an SNMP trap in Internet management (see Hegering, pp. 171-179) or a CMIP notification in OSI management (see Hegering, pp. 134-139); such asynchronous messages are sent by the management agents without request, e.g. triggered by an event detected by the agent in the monitored object; (ii) synchronous status messages, i.e. responses of management agents returned in response to status requests, e.g. from a management server. Again, the message may be an SNMP response (issued in response to an SNMP request) in Internet management or a CMIP response (issued in response to a CMIP GET operation) in OSI management. Both asynchronous and synchronous messages are generically denoted as “status messages” hereinafter.
In contrast to most of today's network interconnect devices many of the applications available on the market are not (or, at least, not completely) instrumented for application management; in particular, they are often not enabled to provide information about their status to a management system via an agent, as interconnect devices can. Furthermore, certain high-level services may be virtual or logical elements which are not made up of a single application or resource, but of a (virtual) assembly of several applications and/or resources. For example, such a high-level service element might be a “credit department” of a bank, which relies on several applications. Typically, such virtual high-level services are not objects instrumented for management, either; in particular, they will not be able to send status messages to a management system. Therefore, the current status of such uninstrumented applications or high-level services is often not directly determined, but is indirectly concluded from the current status of lower-level elements on which it depends. For example, a failure of a network interconnect device may have an impact on applications and services on a server, for example a DNS server which can no longer be reached due to the failure, which, in turn, may influence the status of higher-level applications and services relying on it. In order to perform such IT-infrastructure management based on indirect conclusion, a representation of the service-element graph is established in the IT management system, and the (virtual) status of at least some of the service elements is affected by status messages received from other service elements on which they depend.
In an IT-infrastructure-management system with such a virtual service-element graph, as described in US 2002/0138638 A1, status messages from monitored objects are directly mapped to the service elements representing the respective monitored objects. Virtual service elements may depend on the status of the respective monitored object and be affected by a status change of it. As a consequence of a receipt of such a status-change-indicating message at the service element representing such a monitored object, the status of the service element may be changed and the status change (not the status message itself) may be propagated upwardly to also influence the status of the higher-level service elements. For example, if a certain virtual service element depends on the availability of a certain hardware resource, directing a status message indicating a failure of this resource to the representation of this resource in the service-element graph and upward propagating the status change may cause the status of the superordinate service element to change from a “normal” state to a “critical” state which indicates that the virtual service element is not (fully) available.
In the prior art, two ways to map status messages to service elements are known: (i) a status message is mapped to a service element at the lowest hierarchical level, typically to a representation of the monitored object from which the status message originates. The status message may influence the status of the lowest-level service element to which it is mapped. This status change (not the status message itself) is then propagated upwardly in the service-element graph according to predefined rules. An example of such a status-upward propagation is illustrated at the left-hand side of FIG. 6 of US 2002/0138638 A1; (ii) if a status message does not specifically refer to a monitored device, but to a higher-level service element, it may be immediately directed to the higher-level service element, and may directly influence its status. This may happen if an application is “instrumented” for management. An example of this is illustrated at the right-hand side of FIG. 6 of US 2002/0138638 A1. In both cases, the service elements of the service-element graph are identifiable by a unique service-element identifier; and each status message carries with it a service-element identifier referencing the service element for which the message is destined.