Managed networks exist everywhere, including telecommunications networks, utilities pipelines, television and radio broadcast networks, to name just a few. Such networks are typically monitored to detect component failures and deviations from optimum performance caused by changing operating conditions. As a result, most conventional networks include self-monitoring systems which vary widely in their level of sophistication depending on the type and complexity of the network involved.
Irrespective of the level of complexity of the network, a managed network can conventionally include centralised or regionalised management capabilities to allow network operators to monitor the status and performance of network components and re-configure the network in response to changing operational needs or failures.
This is acceptable if the managed network can be monitored and maintained by a reasonable number of experienced network operations personnel. Under these circumstances, human operators are responsible for correlating the streams of state change events and performance measurements received from network components and, based on their experience of operating that network under a range of operational and fault conditions, adjusting the operational parameters to provide the required level of service to their customers.
However, where a network becomes more complex with a large number of interconnected network components, it becomes more difficult for network operators to diagnose problems encountered with the network because of the sheer volume of information that is presented to them.
In addition, the continuous reliable operation of the network may be critical, for example, if the network relates to essential telecommunications or nuclear plant control or other infrastructure power supply etc. In these cases it is important that there is an accurate awareness of any vulnerability within the network, the appropriate back-ups designed into the network, and an ability to quickly locate the source of observed network failure.
In this situation, network operators often turn to automated correlation systems in an attempt to offload some of the analysis workload and speed up fault resolution times. Conventionally, such systems analyze state change events and/or performance measurements gathered from the live network and depending on their level of sophistication, attempt to identify network problems, their impact on delivered services and to perform root cause analysis. In general however these types of system are driven by state change events and performance measurement changes caused by actual network problems. Accordingly, they do not typically provide anything more than a localized ‘what-if’ predictive capability to evaluate the vulnerability of an existing or planned network to component problems or changing operating conditions