Modern communication systems involve a delicate interplay of network components that support voice and data services. These systems are vital to business operations, such that downtime imposes a significant cost to the business. Ensuring that networks perform to their architected availability and mitigating the risk of downtime are key drivers for information managers. Whether the infrastructure is supporting e-commerce, regulatory compliance reports, supply chain management, or even internal electronic mail, loss of connectivity has a severe impact. For example, as applications, such as complex ordering, billing and communication systems, have been added to the Internet Protocol (IP) ensuring that networks remain connected and available is of key concern. The impact of network failures (even very minor ones lasting only minutes) can be measured in thousands or even millions of dollars. The ability to quickly identify faults and restore network connectivity are critical to helping companies meet and exceed their business objectives. Consequently, network monitoring systems are needed to detect network anomalies, stemming from network component failure, cable cuts, etc.
Network monitoring involves receiving and interpreting a multitude of alarms that are assigned to various network components. These alarms are triggered when anomalies are detected in their respective components. Monitoring systems provide these alarms in form of reports for network analysts (or network monitors) to analyze the cause of the network anomaly and to manually initiate action to resolve the cause. Such resolution can also entail manually interfacing with multiple disparate systems.
Given the size of modern networks, the number of alarms can be unmanageable. That is, the network monitors may be inundated periodically with alarm reports stemming from a major network problem, or even a series of small anomalies arising within a short time span. These events can thus trigger a tremendous volume of alarm reports, which can overwhelm the network surveillance engineers and hamper the process of restoring the network. Reducing the restoration time per network event can translate into significant savings to the customer.
In conventional network monitoring environments, network surveillance engineers receive alarm reports from the telecommunications network and then manually process these alarm reports. Processing an alarm report involves an orderly procedure for resolving the anomaly that generated an alarm. The processing of alarm reports to resolve network anomalies can require retrieving network parameter information, such as equipment operating characteristics from paper manuals; consulting telephone directories to find telephone numbers for remote network sites; collection configuration information from the network equipment associated with the trouble; and completing electronic telecommunications trouble forms, referred to as trouble tickets or service reports. A network surveillance engineer prepares a trouble ticket (or service report) when action by a field engineer appears necessary. Field engineers are typically telecommunications personnel who service the telecommunications network (e.g., replacing a faulty component at a specific location).
Traditionally, organizations and businesses have resorted to addressing the daunting, costly task of network monitoring and maintenance on their own. These “Do-It-Yourself (DIY)” organizations assume the heavy financial costs associated with hardware, software and human capital of network management systems. Moreover, these customer organizations are generally are ill equipped to fully diagnose the problems caused or contributed by third parties (e.g., Local Exchange Carriers (LECs))—i.e., they lack end-to-end visibility.
Based on the foregoing, there is a need for integrating and automating the processes and systems to provide fault detection and recovery of communications networks. There is also a need for an approach to provide rapid fault isolation and resolution.