According to the principles of a network management system, a typical management structure of a communications system, for example, of a (mobile) telecommunications system, comprises several hierarchical levels for the management of the communications system. Hierarchical level of a management network means that every level in the management network of the communications system has a certain management and/or communications system related functionality specific for this level, and that, depending on its hierarchical position in the network, it performs a certain management function. Each of these hierarchical levels, except for the top level and the first-line level, has a double management function manager—function and agent function. Each hierarchical level, except for the first-line level, has a manager function with regard to the underlying level, and every hierarchical level, except for the top level, has an agent function with regard to the level before. Thus, management of a communications system features a hierarchical structure clearly defining the functions at every hierarchical level of this communications system or of the management network of the communications system respectively.
Each level comprises corresponding entities or elements being of physical and/or abstract nature. Thus, an entity of a hierarchical level can be a software and/or hardware (device) in a communications system. In the following, such entities or elements will be referred to as “network entities”. Depending on the level of the management network performing a functionality of a manager or an agent, or both, these network entities are managers, agents, or both. In the following, the terms manager and agent or management level and agent level, respectively, will be used in dependence of the functionality of the corresponding hierarchical level and, thus, in dependence of the corresponding functionality of the corresponding network entity of this hierarchical level. For this reason, if a level represents both the management and the agent level, a network entity comprised in this hierarchical level will be a manager or an agent depending on the function to be performed at a given time by this network entity.
Network management as such refers to the Operation, Administration, and Maintenance (OAM) of communications systems or networks like telecommunications networks at the top level. Network management is the execution of a variety of functions required for controlling, planning, allocating, deploying, coordinating, and/or monitoring the resources of a network, including performing functions such as initial network planning, frequency allocation, predetermined traffic routing to support load balancing, cryptographic key distribution authorization, configuration management, fault management, security management, performance management, bandwidth management, and/or accounting management. Further, in such a management system hardware and/or software are provided that support OAM functionality and provide these functions, for example, to network users and/or administrators. Thus, OAM includes facilities for operating, managing and maintaining networks.
Managers in a communications system are configured to start operations for the operation, administration and maintenance of the communications network comprising configuration, fault and/or performance management (CM, FM, and/or PM) of the communications system, for example, as mentioned above. It is done by sending requests, which are performed by the agents, in particular, by the agents assigned to the corresponding managers. The managers receive then corresponding feedbacks, called responses, from the agents.
Network entities implementing the functionality of an agent in the communications network recognize events relevant for the operation, administration and maintenance of the communications network (e.g. alarms), generate corresponding notifications, and transmit these notifications, usually as event reports, to the managers, in particular, to the managers the network entities are assigned to. In this way, network management is performed in a conventional management network of a communications system.
In the following, the above described data (e.g. alarms, notifications) caused by or being related to fault and/or performance management related events will be in general referred to as fault and/or performance management related data. An event can be seen as something mostly referring to changes in communications system, which provoke a pre-defined reaction or response from a network entity. Further, an event can be a cause for one or more than one subsequent events in the communications system.
The provision of OAM functionality like CM, FM and/or PM, for example, is assured by communication between the hierarchical levels of the management network of the communications system, wherein the network entities of an upper level manage the network entities of the underlying level to ensure a correct performance of the OAM functionality and the managed network entities act depending on the management of the upper management level. Further, in the management network of the communications system a strict assignment exists between managers and agents. A manager has a certain set of agents it has to manage. Agents, in turn, are assigned to one manager. Thus, the performance and safeguarding of the OAM functionality is done in a strict hierarchical way between the levels of the management network of the communications system.
CM serves the purpose of making whole networked and distributed system available, while FM and PM keeps the system operational, or restores an operational state. The most important CM tasks are inventorizing or checking and noting configurations and/or distribution of (hardware and/or software) entities, elements, and/or components of a communications system; and appropriate management to ascertain the changes applied by communications system (hardware and/or software) entity, element, and/or component distribution, and where appropriate to implement a corresponding reconfiguration. Additionally, CM is also responsible for installation of documentation and directory services.
FM comprises functions for detecting, isolating, and correcting malfunctions in a (tele-) communications system. FM and its functions compensate for environmental changes, and include maintaining and examining error logs, accepting and acting on error detection notifications, tracing and identifying faults, carrying out sequences of diagnostics tests, correcting faults, reporting error conditions, and localizing and tracing faults by examining and manipulating database information. Thus, when a fault or another FM related event, any causing initiation or implementation of at least one FM related function, occurs, a network component will often send a notification to the network operator using a protocol, such as SNMP for example. An alarm is a persistent indication of a fault that clears only when the triggering condition has been resolved.
PM, in turn, records the system load and displays performance bottlenecks and has a direct influence on network deployment, network extensions and error management. Parameters such as the response time, round trip time, and delay time are important for PM, as are the theoretical performance limits and network load. These parameters are influenced by a number of transmission characteristics such as flow control, access method, attenuation or packet loss rates. PM allows operators to monitor network load and detect performance trends for future network planning. Thus, when a performance bottleneck of another PM related event occurs in the communications system, at least one PM related function is then performed.
The communication between the hierarchical levels of a management network of a communications system and thus between the managers and the agents is usually facilitated by management interfaces, called OAM interfaces. The implementation of these interfaces can be preformed, for example, by appliance of protocols like Simple Network Management Protocol (SNMP), Transaction Language 1 (TL1), Extensible Markup Language (XML), or Common Object Request Broker Architecture (CORBA).
An example of a conventional management network of a communications system like a (mobile) telecommunications system is shown in FIG. 1, where three hierarchical levels 150, 151, 152 of such a management network of a telecommunications system are presented.
In the following, with reference to FIG. 1, FM and/or PM, being important and typical OAM functions, will be regarded in more detail.
As already outlined above, the FM and/or PM is performed by providing FM and/or PM related data from the lower levels to the upper levels, where FM and/or PM relevant or related decisions are made, and results of these decisions are then transmitted from the upper levels back to the lower levels.
At the first line level 152, the management network of a telecommunications system consists of network elements (NEs) 121, 122, 123, and 124. In the following, this hierarchical level 152 will be referred to as the “NE level”. A network element (NE) 121, 122, 123, 124 is a kind of telecommunications (hardware) equipment or element that is addressable and manageable. NE can also be seen as a combination of hardware and software or a network entity comprising software that primarily performs telecommunications service functions or predefined and a priori agreed upon functions and, thus, provides support or services to users, for example. NEs 121, 122, 123, 124 are interconnected and managed through at least one Element Manager System (EMS) 111, 112 comprised in the upper management level 151, which will be referred to as the “EMS level” in the following. The NE level 152 performs the agent functionality, and the EMS level 151, in turn, performs a manager functionality with regard to the NE level 152 and an agent functionality with regard to the upper level 150 in the hierarchy of the management network.
An EMS 111, 112 is a manager of one or more of a specific type of NEs 121, 122, 123, 124 and allows to manage all the features of each NE 121, 122, 123, 124 individually. Each of the NEs 121, 122, 123, 124 is connected to one responsible and managing EMS 111, 112 via appropriate links. The communication and, thus, the exchange of OAM related data like fault and/or performance management related data or configuration management related data between the NE level 152 and the EMS level 151 and thus between the NEs 121, 122, 123, 124 and the EMS 111, 112 is ensured by special management interfaces 141, 142, 143, 144, like EMS/NE Operation and Maintenance (OAM) interfaces, implemented on the links between the NE and EMS level 152, 151. Such connections or interfaces between the EMS and NEs are called also “southbound” connections or interfaces.
EMS 111, 112, in turn, are managed by an Operations Support System (OSS) 100 of the top level 150, in the following referred to as the “OSS level”. The OSS 100 monitors the underlying management layers 151, 152 and predominantly looks at functional and non-functional requirements of the communications system and of the underlying layers 131, 132. The OSS level 150 performs just a manager function with regard to the underlying EMS level 151, wherein, when considering these two levels, the EMS level performs an agent function. The communication and, thus, the exchange of OAM related data like fault and/or performance management related data or configuration management related data between the OSS level 150 and the EMS level 151 or the OSS 100 and the EMS 111, 112 respectively is enabled by links between the two levels, wherein management interfaces 131, 132, like EMS/OAM interfaces, are implemented on these links for this purpose. The connections or interfaces between the OSS level 150 and EMS level 151 are also known as “northbound” connections or interfaces.
The NM level 152 or there comprised NMs 121, 122, 123, 124 and OSS level 150 or there comprised OSS 100 monitor permanently the system performance of a live network. When problems occur countermeasures have to be taken in order to maintain the quality of service (QoS) at acceptable levels. In the conventional systems, this process involves transferring data across numerous (vertical) interfaces between hierarchical systems. Thus, if a fault and/or performance management related event like a fault occurs in a NE, fault and/or performance management related data like an alarm is sent northbound to the EM. A single (primary) fault and/or performance management related event like (primary) fault in a NE can result due to the interdependencies in the call processing area in multiple (secondary) fault and/or performance management related events like (secondary) faults in other NEs. In such a situation, all NEs impacted by a fault send alarm information northbound.
In FIGS. 2 and 3 such a situation is visualised exemplary. In FIG. 2, a fault and/or performance management related event like a fault occurs at first at NE 222. This fault and/or performance management related event or fault will be called primary fault and/or performance management related event or fault. Due to relations to NEs 223 and 224 (shown as dashed lines) from the NE 222 secondary fault and/or performance management related events, here faults, occur also at NEs 223 and 224. In FIG. 2 (and also in following figures), fault and/or performance management related events like faults are visualized by bolts. Each of the NEs 222, 223, 224 send the corresponding fault and/or performance management related data, here alarms, independently from each other to the managing EMS 211. In case of FIG. 2, all alarmed NEs 222, 223, 224 are managed by the same EM 211. These fault and/or performance management related data like alarms, in the present situation, are uncorrelated and do not provide information about the root cause. Thus, it is possible to analyse and correlate the fault and/or performance management related data, here alarm information, at the EM level (in order to extract e. g. the root cause). Otherwise this is only possible at the next level, as shown in FIG. 3.
In FIG. 3, the origin situation is similar to the situation of FIG. 2. In FIG. 3, a primary fault and/or performance management related event like a fault occurs at first at NE 322. Due to relations to NEs 323 and 324 (shown as dashed lines) from the NE 322 secondary fault and/or performance management related events, here faults, occur also at NEs 323 and 324. Each of the NEs 322, 323, 324 send the corresponding fault and/or performance management related data, here alarms, independently from each other to the managing EMS 311 and 312. However, here the alarmed NEs 322, 323, 324 are managed by different EMS. For this reason, the fault and/or performance management related data, here alarm information, is provided by the corresponding EMS 311 and 312 to the next managing level, to OSS 300, respectively, for analyzing and/or correlation purpose.
Independent of the fact, whether the situation of FIG. 2 or whether the situation of FIG. 3 becomes true, this data mining on fault and/or performance management related data (e.g. alarms) is a difficult task requiring special applications. In order to be successful the corresponding fault and/or performance management related data (alarms) has to provide sufficient information. If this is not the case (and this actually happens quite often), fault and/or performance management related data (alarms) cannot be correlated. This would e. g. mean that the root cause analysis fails.
The above described situations and, thus, the conventional FM and/or PM provide the following disadvantage that all fault and/or performance management related data has to be transferred to a higher level (e.g. EM and/or OSS). This requires bandwidth and processing power. Further, only raw unprocessed information is sent by the agents of an agent level like NEs or EMs to the next managing level like EMS level or OSS level. The next managing level has to extract relationships between the received fault and/or performance management related data like alarms, root causes etc.
Furthermore, in case the fault and/or performance management related data do not provide sufficient information on the reported fault and/or performance management related event, it is not possible to find out the root cause. Additionally, the process of correct conventional fault and/or performance management securing is slow and does not allow for a fast reaction or response to network problems, with the known implications (reduced QoS, customer satisfaction decreases, etc.).
Because of the huge amounts of data to be transferred between the different management levels, the process of correct conventional fault and/or performance management securing is typically not automated. Thus, the operator has to analyze the fault and/or performance management related data by himself and also take corrective actions by himself. This task may be assisted of course by some applications, but these applications have then to be provided by the operator or a systems integrator. These parties typically do not have the in-depth knowledge of the different agents like NEs required to do the correlation of fault and/or performance management related data. For this reason, the process is often error-prone because some correlation is lost, when data is being passed upwards from the lower agent levels to the upper managing levels (e.g. from NE level to EMS level and/or from EMS level to OMS level).