Presently, there is a revolution with the advent of big data techniques. Where prior data storage technologies, such as relational databases, were not sufficiently performant for large amounts of data, the advent of alternative data stores, based in the cloud, along with parallel processing algorithms, such as map-reduce, have made big data practical, performant, and cost effective. Furthermore, there have been recent advances in performant processing on large amounts of data to allow for real-time (or near-real time) analysis of data. One example includes Spark which provides such processing on Hadoop and leverages in-memory computation.
Big data and machine learning techniques, may be applied to a wide array of domains. One example is wireline communication networks, which have experienced constant and significant transformation, guided by the continuous development of new network technologies and services for customer provided equipment (CPE), such as modems, telephones, routers, switches, residential gateways, home networking adapters, and internet access gateways. Such CPE's typically support telephone, internet, television (TV) service, and/or other popular consumer data services.
Wireline service providers have continuously evolved to cope with the increasing traffic demands, as well as the performance requirements imposed by these wireline applications. In one aspect, the CPEs and/or their supporting network concentration nodes are frequently upgraded in hardware and/or software to accommodate the increasing demand in data. In some instances CPEs are introduced in new areas. Installation of these CPEs and their supporting concentration nodes, collectively referred to herein as “nodes,” creates a formidable operational challenge for wireline network operators that need to grow their systems in a sustainable way.
This creates a tremendous operational challenge for network operators that need to grow their systems in a sustainable way and maintain the healthy operation of every node. In existing systems, when a customer experiences a malfunction with their CPE, the customer first must call and explain the symptoms to a customer care representative, who classifies the matter based on predetermined criteria. An internal record may be generated to be analyzed by an appropriately skilled engineer. An engineer may create a Call Reliability Report (CRR) that includes the traffic volume, number of dropped calls, reasons for the dropped calls, and an overall calculated drop call rate for the node(s) of concern. In some cases, there is an intermediate engineering triaging group that may address some of the complaints, while in other cases, the case is sent to a field engineer to investigate the root of the malfunction. Due to the complexity of present wireline systems, the field engineer may investigate multiple systems to determine what the cause of the malfunction may be based on the limited analysis that s/he could perform from the local (i.e., node) knowledge of the network.
The challenges of the engineer are further exacerbated by the fact that in today's wireline networks, there are an increasing number of data sources with a substantial amount of performance data that is collected from each node, which is often aggregated in time intervals such as minutes, hours, days, etc. Together the data provides key performance indicators (KPIs), which are reviewed by engineers to better understand the overall health of the wireline network, detect problematic situations, and decide when it is time to upgrade part of the network. Furthermore, in existing systems it is difficult to aggregate data from disparate nodes, particularly if they are of different nature. For example, a report from a CPE (e.g., in the form of a set-top box) indicating frequent resets at a home location is not easy to combine with the CPU temperature of a concentration node in its critical path. Such specialized report requires significant processing and therefore may not be immediately available to the engineer. Once the data is made available, the engineer must still manually analyze the large volume of data, which is not only inefficient but may also be incomplete because an engineer may not be able to effectively discern trends from multiple data sources. Further, a distinction is not made between an existing node and a new node to assure a robust installation. The result is typically a sub-optimal wireline network performance and a poor customer experience. It is with respect to these considerations and others that the present disclosure has been written.