Service provider networks consist of segments which are geographically distributed, and functions which are best viewed technically as a series of layers, typically based on the seven-layer Open Systems Interconnection (OSI) model. When service provider networks provided access to web and e-mail programs, troubleshooting was limited and fairly simple, as there was little or no interaction between the programs driving the user's experience on the internet and the signals being sent between computers to make that experience possible. With the advent of such technology that allows video, voice and other time and frequency-sensitive technologies to be sent over service provider networks, the need for a stable network with few or no dropped packets is critical.
The advent of broadband has led to a large number of dependencies on the network for the application's end-user's experience. This makes troubleshooting more complex because there are interdependencies between different operating groups within a service provider network.
An approach to troubleshooting such service provider networks for subscriber issues is to diagnose issues by leveraging the knowledge of each technician within each layer. The technician then troubleshoots each layer using each Element Management Server (EMS) or Command Line Interface (CLI) independently. Then the information between technicians is manually correlated through communication between technicians who are busy troubleshooting each network element.
This approach does not work well for distinguishing between problem-specific and subscriber-specific issues is difficult. First, if there is a problem with any layer of the service provider network, then there is no use troubleshooting any layer above that or any subscriber-based issues. But it would take multiple technicians to determine that there is a problem in a given layer. A problem in lower layers would propagate to higher layers, and so helping a subscriber would be fruitless. For instance, a subscriber connecting to an Internet Protocol (IP) Network will get an authentication failure or a connection timeout. This failure could be due to a transport network failure, failure of the Authentication, Authorization and Accounting (AAA) network element, or from the AAA server being down, and so on. Also all elements in layers below IP such as the Radio Frequency (RF) layer would show no failures.
Second, if it is determined that the service provider network is not working properly, it may or may not affect a given subscriber if that subscriber is not being routed through any of the failed network elements. Determination of the path a particular subscriber is using requires manual correlation of messages through various nodes.
Third, subscriber-specific issues require correlation of call flow or control messages from different network elements across the service provider network. One way to accomplish this correlation would be to have multiple technicians, each monitoring the layer associated with their expertise (RF, IP, Applications, etc), and then through manual communication troubleshoot the subscriber's issues. This is not an efficient use of resources as each technician has to perform some level of diagnostics before communicating with the others. Isolating a problem in a network around a specific subscriber then becomes both a time-consuming and resource-consuming process.
Fourth, as problems are found, vendors who have little time or motivation to write detailed explanations and troubleshooting guides instead return cryptic diagnostic codes to the operator via the EMS or the CLI. The technician must in the best case look it up in a book if to provide a human-readable explanation, in the worst case contact the vendor who then may need to find an engineer to look at the source code to determine what the diagnostic code means and possibly recommend a course of action.
Another issue is the scalability of the manual troubleshooting techniques. As the complexity of the network has grown, there is a need for more and more people to become involved in troubleshooting each subscriber's issues. As the number of subscribers increases, the use of more and more technicians to troubleshoot network problems will eventually use up the resources for a given problem. Further, a given network may have different implementations of a given element from different vendors, making the troubleshooting even more complex.