Root Cause Analysis (RCA) is a technique used in many different segments, such as medicine and engineering, for identifying the origin of a problem. Generally, RCA can be broken down to a series of steps to find the primary cause of the problem, so that you can determine what happened, determine why it happened, and figure out what to do to reduce the likelihood that it will happen again. RCA assumes that systems and events are interrelated. Thus, an action in one area triggers an action in another, and another, and so on. By tracing back these actions, you can discover where the problem started and how it grew into the symptom you're now facing.
Radio Access Network (RAN) optimization activities may be of particular concern in certain types of mobile communications, such as, for example, Long Term Evolution (LTE) and/or Voice over LTE (VoLTE) networks. For example, according to an existing solution, once a certain underperforming area has been identified, RAN network traces that store signaling messages interaction between a wireless device (also referred to as a UE) and an antenna may be recorded. Once the RAN network traces have finished recording, the files generated may be processed and converted to one or more formats that may be more suitable for analysis. The RAN network traces, after being converted to the formats suitable for analysis, may then be analyzed in order to determine what happened, to determine why it happened, and to figure out what to do to fix it. This process may be very time consuming.
Another existing solution for addressing problematic calls in a mobile communications system is found in Application WO2005/032186, which describes a method for performance management in a cellular mobile packet data network that captures raw traffic traces, builds a traffic and session database, defines a set of appropriate key performance indicators, and calculates key performance indicators. Still another approach, found in the BUSS CEA product, purports to create so-called Extended Session Records, where cell related information (RSCP, RSRP, . . . , CellID, . . . ) may be correlated with end-user application performance indicators on a session level. These products may use CTR, EBM, GPEH, or other related sources. According to another existing solution, software provides an internal message indicating the main reason why a call connection was interrupted.
These existing solutions may have certain deficiencies. For example, the reasons indicated for why a call connection was interrupted may be very generic, and may not provide enough information to make decisions and carry on optimization activities. Thus, further studies—including network recording traces and corresponding manual analysis is required. The message information, in combination with information extracted with processed network traces, may be used to create a basic logic to provide a diagnosis for the problematic call. Despite combining the two sources of information, the output of such an approach is still not valid. The output may be invalid for a variety of reasons, including inaccurate diagnosis and/or the fact that it fails to consider VoLTE specific issues. As another example, existing solutions may not support the LTE/VoLTE standard. Still other existing solutions may be based on Configuration Management (CM) and Performance Management (PM) Key Performance Indicators (KPI), which may be inefficient.
Other approaches attempt to address the problem of problematic calls in a mobile communications system using UE logs. ASCOM (TEMS investigation) is one example that provides a service/product that allows for setup of specific equipment for testing, collecting measurements, and post-processing of UE logs. The ASCOM solution offers a graphical interface for analysis of the collected information and troubleshooting to solve the issues detected. Other examples, such as the MobileCem (FalconLive solution) and Hasati solutions have developed drive test solutions similar to the ASCOM (TEMS solution), along with streaming data collection and transferring in the process. Ultimately, however, these solutions merely process the UE logs once received and provide a basic translation of the information into human-understandable logs for the engineers analyze.
These solutions, however, may also have certain deficiencies. For example, existing solutions based on UE logs are still very extended to provide services where there is not commercial traffic, so the use of testing UEs is needed. There are also services that require UE logs to perform deep in the field analysis, like First Office Applications (FOA) of new radio network systems features. Delivering these services may be complex because of the logistics involved in physically moving engineers to the field, setting up the equipment, collecting the measurements/UE logs raw data, uploading the measurements/UE logs raw data to remote servers, processing the collected UE logs, analyzing the processed data, and, if needed, performing troubleshooting activities. This complexity translates to a high cost for service providers, delays in networks evolution and consequently reduced margins.
The deficiencies of the various existing solutions are exacerbated at large scales, e.g., when the underperforming communication system to be optimized has hundreds (or even thousands) of wireless devices. The problem of scalability may make root cause determinations of problematic calls unmanageable. Thus, there is a need for an improved method of determining the root cause of problematic calls in a mobile communications system.