The present invention relates generally to monitoring of networks, and more particularly, to a method and system for identifying congestion and anomalies in networks.
With the rapid emergence and growth of networks, monitoring network performance has become an essential aspect of network management. For example, as increasing number of businesses and consumers rely on the Internet for accessing information and conducting commerce, there is a greater need for assuring quality of services provided by network service providers, such as Internet Service Providers (ISPs).
To assure the quality of services provided by the ISPs, a number of performance monitoring tools have emerged. These tools measure performance metrics, such as latency, throughput, bandwidth, network availability, etc. from the perspective of one or more monitoring nodes located at various points in the network. Specifically, the tools measure xe2x80x9cmonitor-to-destinationxe2x80x9d performance of a network by sending special messages, such as Simple Network Management Protocol (SNMP) messages from the monitoring nodes to various destination nodes in the network.
One example of a monitor-to-destination performance tool is the Internet Traffic Report (ITR) tool. ITR is provided by Andover News Network and is available at xe2x80x9chttp://www.internettrafficreport.com.xe2x80x9d ITR monitors the flow of information on the Internet and generates indexes that measure the reliability and performance of major paths in the Internet. For example, ITR performs xe2x80x9cpingxe2x80x9d tests to measure round-trip delay along major paths on the Internet. ITR compares the response associated with each ping test with past responses from the same test, and based on the response, determines a performance score on a scale of zero to 100. ITR then averages the scores into an index that represents the amount of traffic on the Internet. The index, however, does not identify congestion and anomalies at individual nodes, links, or small segments in the Internet.
One alternative to ITR is the Matrix Information and Directory Services (MIDS) tool, which is provided by Matrix Internet Quality and is available at xe2x80x9chttp://www.miq.net.xe2x80x9d MIDS measures performance metrics associated with destination nodes in the Internet, such as Domain Name System (DNS) servers, Top Level Domain (TLD) servers, routers, etc. MIDS compares the metrics associated with various ISP networks and the Internet as a whole to evaluate the performance and reliability of services provided by the ISPs. Although Matrix Internet Quality claims in its web site that MIDS is capable of isolating network performance problems to specific servers, routers, links, or interconnection points in the Internet, it is not known whether MIDS can reliably identify congestion and anomalies without generating large number of false alarms.
To overcome the above and other disadvantages of the prior art, it is desired to provide methods and systems for identifying congestion and anomalies at individual nodes, links, and segments in networks.
In accordance with an embodiment of the invention, a monitoring station determines for each link in a network a statistical model that is based on a harmonic analysis. Based on the model, the monitoring station estimates one or more metrics associated with each of the links. Using the estimated metrics, the monitoring station then determines whether congestion or anomalies exist at each link.
In determining a model for each link in the network, the monitoring station collects from each link metric data, such as time delay, packet loss, throughput, or any other data associated with a link. The monitoring station then selects for each link a model that includes one or more harmonic components, such as a sum of sinusoidal components. By estimating parameters of the harmonic components, the monitoring station then captures the periodicity of the collected metric data in terms of the harmonic components of the model. The monitoring station uses the model to estimate the normal state of the network from which estimated metric data are determined.
Based on the estimated metric data for each link, the monitoring station then determines whether congestion exists at each individual link. For example, the monitoring station determines an average value and a variance for the estimated metric data associated with a link. Based on the average value, the variance, and a predetermined threshold, the monitoring station determines a z-value and a corresponding p-value. If the p-value is smaller than a predetermined statistical significance, the monitoring station determines that a congestion exists in the link. Thus, by comparing the p-value of the estimated metric data with the predetermined statistical significance, the monitoring station identifies in the collected metric data signals that are indicative of a congestion in the link, instead of identifying peaks and bursts that are mostly natural random variations in the collected metric data and are not indicative of a congestion.
Using a statistical model that is based on a harmonic analysis, the monitoring station may also determine whether one or more anomalies exist at each individual link in the network. For example, the monitoring station collects new metric data associated with a link, and using the model, estimates metric data associated with that link. The monitoring station then determines a residual between the newly collected metric data and the estimated metric data. The monitoring station searches for anomalous patterns in the residual using, for example, a statistical control method to determine whether an anomaly exists in the link. Thus, by statistically modeling metrics associated with each link under normal network conditions and monitoring the residual associated with each link, the monitoring station determines whether an anomaly exists in a link.
For example, based on the residuals, the monitoring station determines a first diagnostic for detecting a persistent step change in the new metric data associated with the link and determines a second diagnostic for detecting a persistent slope change in the new metric data. The monitoring station then determines a first p-value associated with the first diagnostic and a second p-value associated with the second diagnostic. If the first p-value or the second p-value is smaller than a predetermined threshold, the monitoring station determines that an anomaly exists in the link.
In another embodiment of the invention, a monitoring station determines for each node in a network a statistical model that is based on a harmonic analysis. A node may include, for example, a router, switch, bridge, host, connection point, or any other processing device. Based on the model, the monitoring station estimates one or more metrics associated with each of the nodes. Using the estimated metrics, the monitoring station then determines whether congestion or anomalies exist at each node.
In yet another embodiment of the invention, a monitoring station determines for each segment in a network a statistical model that is based on a harmonic analysis. A segment may include, for example, one or more nodes and links in the network. Based on the model, the monitoring station estimates one or more metrics associated with each of the segments. Using the estimated metrics, the monitoring station then determines whether congestion or anomalies exist in each segment.
Accordingly, methods and systems consistent with the present invention have several advantages over the prior art. For example, such methods and systems statistically model metrics collected from individual links, nodes, and segments to estimate behavior of a network. By capturing the periodicity of the collected metrics in terms of harmonic components of the model, such methods and systems identify, in real-time, in the collected metrics distribution signals or patterns that are indicative of congestion or anomalies, instead of identifying peaks and bursts that are mostly natural random variations in the metric data, thus reducing the number of false alarms that may be generated.