The invention relates to a system, device and method for automatic anomaly detection, in particular to the automatic detection of quality indicators updated in real-time.
Some networks may provide a network infrastructure for operators for offering services to the subscribers. Because network infrastructure is very complex and may be affected by the environment, problems may arise which decrease the quality of service experienced by the subscribers. If such problems are detected and solved quickly and efficiently, the quality of service may be kept at a very high level.
The expectations of customers regarding access to services over the Internet are becoming more demanding, and response times for access to critical data are getting more important. As a result, efficient real-time support over networks will be critical for the continued growth of the Internet and intranets. This support for real-time services requires “Quality of Service” management procedures in mobile networks so that the scarce spectrum can be used as efficiently as possible.
The significant growth of networks including an increased number of different elements requires sophisticated methods and tools that enable centralized network and service monitoring in large networks so as to provide effective network operation.
Mechanisms for detecting abnormal situations belong to one of two major categories, namely rule-based detection mechanisms and anomaly detection mechanisms (sometimes called also novelty detection mechanisms). Rule-based detection mechanisms attempt to recognize certain behavior patterns which are known to be improper like exceedings of given thresholds. Thus, rule-based detection mechanisms have two severe limitations: they can only detect problems which have occurred before and which have been explicitly taught to the detection system or programmed into it. Anomaly detection systems (ADS), as used in this application, reverse the detection problem: they are taught what normal behavior is, and anything deviating significantly (by a predetermined margin) from the norm is considered anomalous. ADS mechanisms are capable of detecting potentially problematic situations without explicit training of such situations. An example of an ADS is disclosed in the article: Hoglund, Albert: An Anomaly Detection System for Computer Networks, Master of Science thesis, Helsinki University of Technology 1997. Thus an ADS is defined as a mechanism which is trained with normal behavior of the target system. Accordingly, an ADS flags every significant deviation from normal as a potential anomaly. In contrast, a rule-based detection system is trained with known modes of abnormal behavior and it can only detect the problems that have been taught to it.
Generally it is difficult to have alarms indicating quality of service problems. It is also very challenging to define proper thresholds which generate appropriate numbers of alarms. If the alarm thresholds are too high, there are no notifications about problems. If the alarm thresholds are too low, there are too many alarms to be handled efficiently. If the alarm thresholds are updated manually, the updating is very cumbersome and must be performed whenever the network conditions change. Further, alarm thresholds are normally different in different parts of the network which leads to additional problems.
Usually the operators are not able to freely define Key Performance Indicators (KPIs) which are monitored. The KPIs are defined by network manufacturer and the operator can only select whether or not to use a KPI. In systems which monitor predefined KPIs of a network element, the operator may be able to define alarm thresholds for the KPIs manually. In such cases, it is only possible to monitor the most important issues on a general level. Furthermore, the adjusting of alarm thresholds is very difficult.
With an ever-increasing alarm flow it is vital that the network operator has means to cut down the number of less important alarms and warnings. In this way the operating personnel can concentrate on service-critical alarms that need to be dealt with immediately.
When simply relying on the counting the number of error-indicating events, and issuing an alarm when the number of events exceeds some user determined value, there may be some situations where this solution does not function properly. For example, in front of Helsinki there are some islands with a single base station on them. Boats with several hundreds of passengers will bypass the islands every now and then, and naturally the base station on those islands may be very highly loaded by the mobile subscribers on the boat. When the breaking of calls is counted for causing alarms, such alarms will be false, because the calls are broken by natural phenomena, i.e. the bypassing ship that is moving out of the coverage area of the mobile network, and not by any network malfunction. However, in some other base station a similar course of events might indicate some severe network problem.
The article: Hoglund, Albert: An Anomaly Detection System for Computer Networks, Master of Science thesis, Helsinki University of Technology 1997, discloses an ADS for a Unixbased computer system. The disclosure contents of this article are in toto incorporated herein by reference. The disclosed system consists of a data-gathering component, a user-behavior visualization component, an automatic anomaly detection component and a user interface. The system reduces the amount of data necessary for anomaly detection by selecting a set of features which characterize user behavior in the system. The automatic anomaly detection component approximates users' daily profiles with self-organizing maps (SOM), originally created by Teuvo Kohonen. A crucial parameter of a SOM is a Best Mapping Unit (BMU) distance. The BMUs of the SOMs are used to detect deviations from the daily profiles. A measure of such deviations is expressed as an anomaly P-value. According to reference 1, the ADS has been tested and found capable of detecting a wide range of anomalous behavior.
U.S. Pat. No. 5,365,514 discloses an event-driven interface for a system for monitoring and controlling a data communications network. The device is listening to serial data flow in a LAN (Local Area Network), and provides a control vector. The device is not structured to receive and analyse packets of a packet flow.