Large scale network operations often experience technical malfunctions that degrade system performance. For large networks, this degradation can be difficult to isolate because the problem can be located on remote devices or because the problem manifests itself not as a complete failure, but merely as poor performance. Often, isolating a poor performing component is substantially more difficult than isolating one that has completely malfunctioned. To solve network operation problems, network operators use fault management tools that explore and monitor key aspects of a network.
In traditional fault management the mean time to repair (MTTR) a problem is typically a couple of hours. Given the difficulty with both identifying whether an application is degrading and what the source of the degradation is, the MTTR that is associated with application management can be quite lengthy. In many cases, the MTTR associated with first identifying that an application performance exists, and then identifying the source of that problem, is measured in days or weeks.
The problems encountered range in scope and complexity depending on the source of the problems. Some examples of network operations problems include sluggish mission-critical applications, the misuse of peer-to-peer applications, an underutilized load balance link, or lethargic intranet performance—all which have an adverse effect on network operations and eventually to on organization's productivity. Consequently the scope and complexity of monitoring networks with a wide variety of applications, processes, and distribution points is growing and manufacturers of tools for maintaining network operations struggle to stay up-to-date.
One known problem is when monitoring network traffic for a relatively large network, the amount of information relating to that network traffic can also be relatively large. The sheer volume of nodes and traffic in the network makes it more difficult for a network monitoring device to keep up with that relatively large amount of information. As such what is needed is advanced systems and methods to identify symptoms and problems affecting the communication network, and locate devices that may be the source of those problems.