In telecommunications systems such as cellular systems based on TDMA, CDMA or GSM or 2.5G networks based on GPRS, service providers are challenged by competition to provide ever-improving service quality. As many different telecommunications services emerge, particularly many new wireless services, the service assurance problem becomes increasingly challenging. In the current Network Operation Center (NOC), it is not uncommon to receive hundreds to thousands of various alerts, warnings and alarms in various forms. The NOC personnel dealing with trouble shooting and problem resolutions are usually highly trained technicians specializing in certain specific technology areas. Traditionally, the NOC group is separate from the information technology (IT) organization managing applications and internal IP networks. Problems occurring in one domain are not normally handled with consideration of impacts from other domains. In particular, there is no methodology or procedure in place for the prioritization or root cause analysis of QoS problem.
Current service management is comprised of isolated network management systems and an information technology (IT) based management environment. Network management tasks consist of collecting large amounts of performance data, generating weekly or monthly reports and logging large amounts of events and alarms. Data are mostly generated by a number of disjoint Element Management System (EMSs) or, in some cases, by individual Network Elements (NE). In the service and application areas, traditional IT management platforms such as Openview from Hewlett-Packard, Unicenter from Computer Associates or Tivoli from IBM is popular for monitoring and logging of server and LAN-related alarms and events. There are, however, no correlation between these IT based management platforms and other EMSs. For each isolated domain, true service management is performed by the personnel taking care of a particular domain (application, core, access). Different domains normally are handled by different organizations, which are operated independently with little interaction among each other. There is no integrated and correlated view of service quality and there are inconsistent efforts toward service assurance or long-term planning.
The increasing dependence on wireless technology whether 2G, 2.5, or 3G cellular technologies or wireless LAN (WLAN) technology such as 802.11 WiFi based systems, adds additional complexity to service issues. Bottom-up service assurance systems are focused on collecting data from various network elements or sub-systems but are not focused on whether various services desired by the customer are actually being provided to the satisfaction of the customer.
The overall goal of impact analysis is to quantify service quality degradation with respect to certain predefined service level criteria. The result of such impact analysis can then be used to support the prioritization of service and network alarms, service QoS alerts, and network performance threshold crossing alerts or other performance impacting events with respect to trouble ticket generation. Additionally, the results may be used to support prioritization of network and service resource expansion or for the adjustment of service level agreements for marketing and contractual purposes.
As wireless services proliferate and as each has a shorter life cycle, it is becoming increasingly difficult to train NOC operators with the right skills to handle the various types of services related QoS problems. To assist the NOC personnel on prioritization of QoS alarms, it is desirable that there are tools to collect and extract relevant information regarding the alerts and prioritize them with respect to the impact on customers, quality of services and other criteria such as marketing and planning.
For each component of a service, there is a set of Key Performance Indicators (KPIs) associated with it. Assuming that a service model has 40 components and each has 30 KPIs, that is a total of 1200 KPI for a service. If there are 20 services active at once, we can be potentially dealing with over a 20,000 KPIs. Suppose at a given time, there are 1% of the KPIs cross threshold and generate alerts that amounts to over 200 QoS alerts at a given time. Besides the volume of KPIs and their alerts, it is also difficult to write algorithms that are specific to a particular KPI. Therefore, the impact analysis algorithm has to deal with the scalability and complexity issues at the same time.
Further, it would be desirable to have a method and system to permit systematic prioritization of QoS alarms with respect to some quantitative impact index.
Additionally, it would be desirable to have a system and method that uses a dependency model of a service to prioritize and analyze alert impact.
It would also be desirable to have a method and system that is able to provide impact analysis for a large-scale network and does not suffer from scalability issues.
Finally, it would be desirable to have a method and system capable of assisting the network operator in a root cause analysis of the service impacting alerts identified by the alert prioritization and service impact analysis system.