Network outages cost Service Providers money in several ways, the most obvious being the direct loss of revenue from customers being unable to access the network during the outage, as well as the personal impact to end-users not being able to establish a connection during emergency situations. In addition, with today's trend of offering Service Level Agreements (SLAs) to their customers, Service Providers incur significant additional penalties in the form of free service or punitive damages should their networks become unavailable. Regulators in many countries (e.g., the United States) currently require a detailed report if voice networks experience prolonged outages. This type of requirement may be imposed on data networks and represents a significant concern because of the historically low reliability of data networks as compared to voice networks. It is therefore incumbent upon Service Providers to proactively monitor their networks and address potential outages before they happen.
Unfortunately, with today's technology, this proactive network monitoring is very labor intensive and can never be 100% effective in preventing network outages. For example, a series of seemingly unrelated and minor events over an extended period of time, or in seemingly uncorrelated locations in the network, can escalate to catastrophic network failure and dynamically change the network's security posture. These interactions are often too subtle and occur over an extended time period that is too long for people to recognize the correlation and impending situation. Moreover, planned and unplanned network events (e.g., network maintenance activities vs. network alarms) can also be the cause of major outages and are often documented on separate systems, further exacerbating the problem.
Additionally, the reporting of network reliability and network security information is currently done on separate systems despite the strong correlation between the two. For instance, a cyber-attack on network elements has a direct impact on the network's availability. Likewise, a reduction in the network's reliability can trigger new security vulnerabilities by introducing unanticipated traffic patterns into the network. For example, a failed load balancer with security features would leave a server farm located behind it wide open to attack.