Determining a cause or root cause of an event such as a dropped call, a failed connection set up, or a similar quality problem in a communication network system is important to resolve the underlying problem and avoid any such instances in the future.
Performance management of for instance voice services including call quality monitoring, root cause detection, fault localization has been developed for Circuit Switched (CS) domain based networks such as 2G and 3G networks. However, in next-generation networks such as LTE voice services are provided by a packet-switched (PS) domain and via an IP Multimedia Subsystem (IMS). However, the performance management developed for CS domain based networks is not applicable to the PS domain and IMS domain based networks.
Providing voice services for instance in LTE can happen only via the PS domain and via the IMS system as compared to voice services provided for instance in 2G and 3G networks, which are carried via the CS domain. Since there are fundamental differences in the CS and PS domain architectures and the way how voice is carried over these systems, the way how the performance management of the voice services, including call quality monitoring, root cause detection, fault localization can be done in the PS domain is fundamentally different from those developed for the circuit switched domain.
In the PS domain, a control plane functionality for voice services, i.e., signaling to setup, tear down a call, etc., are done via the IMS system using the Session Initiation Protocol (SIP) protocol, while a voice communication is carried in a user plane, as packet switched traffic via a Real-Time Transport Protocol (RTP) or A Real-Time Control Protocol (RTCP).
As an example a simplified view of a typical prior art system architecture for Voice over LTE (VoLTE) is illustrated in FIG. 5. As shown in FIG. 5, signaling plane nodes include Proxy Call and Session Control Function (P-CSCF) and Serving Call Session Control Function (S-CSCF). In a generic setup there can be a P-CSCF and S-CSCF on both ends of a call in home and visiting networks. In specific cases, however the two sides and even the two nodes may be one single node. In the user plane the traffic goes through the Border Gateway (BGW) nodes, also called media gateway nodes that terminate the RTP/RTCP communication toward the terminal on one end and reopen the RTP/RTCP communication toward the terminal on the other end. FIG. 5 shows a setup when different BGW nodes serve the Mobile Originating (MO) and Mobile Terminating (MT) side of the call.
In order to monitor call quality and different kinds of call success Key Performance Indicators (KPIs), data collection sources from a signaling plane and a user plane need to be available. These can be collected via applying packet probes both for signaling and voice packets and derive KPIs based on the information extracted from the probed packets.
For example to determine call setup, tear-down success ratios, the signaling plane messages can be analyzed by looking into message result codes. In order to determine speech quality KPIs, for example Mean Opinion Score (MOS) metrics, the user plane RTP/RTCP packets need to be analyzed, for instance to check whether there was any speech packet loss or packet reordering or the like causing speech quality impairments.
Although the above prior art method gives the capability to detect the occurrence of for instance signaling or speech quality related problems, the prior art method cannot provide insight to the cause of the problem, which is needed to resolve, the problem and avoid the problem in future. A typical approach currently used to identify a cause, i.e. root cause, of a quality problem is often based on the analyses of certain call related events in isolation for instance with manual or basic or elementary statistical methods. For example, the root cause of call setup problems is often analyzed purely based on cause codes reported in a respective call setup signaling message. Although the analysis of single call flow messages is necessary, it is often not sufficient to uncover the real causes behind a first event, i.e. incident. To discover such a relationship, it is required to analyze the incident, e.g. negative first event such as a call setup failure, in relation to second events, i.e. other events or occurrences, for instance including events occurring at other parts of the network communication system. For example, an abnormal call termination may be the consequence of an unsuccessful handover in the network communication system, i.e. radio network, which may be attempted long before the abnormal call termination event, i.e. several seconds before, and a cause code of the termination typically does not hint at any such relation to other events, e.g. happening in another domain of the network.
Thus, there is a need to analyze events occurring in combination. However this requires a high level of domain and expert knowledge. Therefore, these tasks are currently typically done by manual work.
According to the prior art for instance root cause symptoms are often identified by making a large number of active test calls with special terminals, capable of recording and reporting all events in detail and then manually analyzing generated logs one-by-one. This approach is, however not scalable, very sensitive to the specific experts knowledge and prone to human errors and limitations.