Communication networks have migrated from using specialized networking equipment executing on dedicated hardware, like routers, firewalls, and gateways, to software defined networks (SDNs) executing as virtualized network functions (VNF) in a cloud infrastructure. To provide a service, a set of VNFs may be instantiated on the general purpose hardware. Each VNF may require one or more virtual machines (VMs) to be instantiated. In turn, VMs may require various resources, such as memory, virtual computer processing units (vCPUs), and network interfaces or network interface cards (NICs).
The trend towards large scale commercial deployment of complex virtualized services/cloud-based D2 services (NFV, VNF, SDN) and underlying software infrastructures such as AT&T's ECOMP platform, AIC, OpenStack, has reduced barriers to create complex systems at scale. Indeed, complex virtual systems can be created in a largely software-driven automated way. The use of virtualization, the scale and the complex interactions among components in these systems (VMs can be spun up dynamically, VM-Host associations are not static, rapid deployment and termination of virtualized services) introduce challenges in operational maintenance, troubleshooting and root cause analysis. Some of the challenges include: arbitrarily complex and dynamic interactions and relationships between the virtual and physical entities; the introduction of errors and faults during instantiation is also increased with scale; hidden dependencies between virtual and physical elements; nuanced characteristics of virtualized services; increased runtime complexities; and failure of documentation to keep up with the rapid changes in the system. Overall, the dynamic and scalable nature of virtualized services makes manual troubleshooting extremely challenging, resource intensive and potentially inaccurate. Moreover, the quantity of data from diverse sources (some of them enabled by cloud and SDN platforms)—key performance indicators (KPIs), measurements, topologies, inventories, logs makes it challenging to determine which information is relevant to troubleshooting a specific problem.
This disclosure is directed to solving one or more of the problems in the existing technology.