The present invention relates to performance of information networks. In particular the present invention relates to statistical measurements of performance characteristics of an information network.
Internet web sites continue to become more sophisticated and offer a wider variety of media for a user to access. With this trend, users have become more demanding of quick, high quality internet experiences. As such, to be able to keep up with users"" demands, it has become increasingly important for the providers of Internet content to be able to monitor and troubleshoot Internet performance issues to both avoid degraded performance and provide improved performance.
Given this, systems have been developed for measuring relevant network parameters to evaluate network performance and help troubleshoot network issues which might degrade network performance. Generally, such systems utilize computer servers deployed on a network of interest to measure network performance parameters. Such computer servers are generally referred to as data collection agents (DCAs). A DCA generally connects to a device in the network about which a measurement is desired and takes one or more measurements of one or more predetermined metrics. The DCA then typically stores the results of the measurement either locally or in a remote database. The stored measurements can then be called up and reviewed by a user who accesses the agent.
Such systems can typically measure metrics related to either Universal Resource Locator (URL) objects (such as a web page located on a server on the network) or streaming media objects. URL objects and streaming media objects are collectively referred to herein as network services. With respect to URL objects, such metrics can include, but are not limited to:
End-to-End Time (Seconds): The time taken from the moment a user clicks on a link to the instant the page is fully downloaded and displayed. It encompasses the collection of all objects making up a page including, but not limited to, third party content on off-site servers, graphics, frames, and redirections.
DNS (Domain Name System) Lookup (Seconds): The time it takes for the browser to turn the text based hostname into an IP address.
Connect Time (Seconds): The time it takes to set up a network connection from the end-user""s browser to a web site. A web page is transferred over this connection and many are setup for each page.
Request Time (Seconds): The time it takes to send a request from a user""s browser to a server. This is a relevant amount of time if you are submitting a large form (e.g. a message on an email service), or uploading a file (e.g. an attachment to a message on a discussion board). It reflects the ability of a server to accept data.
Response Time (Seconds): The time it takes for a server to respond with content to the browser. Preferably, this measurement is taken by waiting until the first byte of content is returned to the browser.
Teardown Time (Seconds): The time it takes for the browser and server to disconnect from each other.
Download Time (Seconds): The time for the page download from the start of the first object to the end of the last object.
The unit in parenthesis following the name of the metric is the unit in which the measurement is generally taken and recorded.
With respect to Streaming media objects, such metrics include, but are not limited to:
DNS Lookup Time (seconds): This metric is generally the same as the DNS lookup time for URL-type objects.
Quantity of Data Received (bytes or bits): The absolute amount of data gathered by the DCA if a stream had been rendered.
Packet Loss (number): The number of packets that are not received by the media monitor.
Percent Packet Loss (number): The percentage of total packets that are not received by the media monitor.
Packets Received (number): The total number of packets received by the media monitor.
Packets Late (number): The number of packets received too late to functionally render.
Packets Resend Requested (number): The number of packets that have been requested to be resent. This metric preferably applies to REALMEDIA(copyright) streams.
Packets Recovered (number): The number of packets for which some type of corrective action is taken. xe2x80x9cCorrective actionxe2x80x9d typically means requesting that the missing or broken packets be resent. This metric preferably applies to RealMedia(copyright) streams.
Packets Resent (number): (Also known as packets resend received) the number of packets asked for again (the packets resend requested metric) and were received. This metric preferably applies to RealMedia(copyright) streams.
Packets Received Normally (number): The number of packets received by the media monitor from the streaming media server without incident.
Current Bandwidth (bytes/second): The rate at which data is received measured over a relatively small time frame.
Clip Bandwidth (bytes/second): The rate at which data is received measured over the length of the entire stream or over a relatively long predetermined timeframe.
Results of the above measurements can be used to help determine whether network services operating up to standard. In the context of the internet, results of the above URL object measurements can, for instance, indicate whether a web page is downloading consistently, at a high enough speed, or completely. The results of measurements of the above streaming media parameters can help determine the same information with respect to a streaming media object.
However, while important diagnostic information can be collected about the current status of a particular web page or streaming media service by making individual or random measurements of one or more of the above noted network performance metrics, it can be difficult to use this testing method to fully diagnose performance. For example, using such techniques it can be difficult to determine the performance of a network over time or during certain times of the day, days of the week, or parts of the year. Thus, it can be difficult to detect, and predict, cycles in network operation, such as if a network operates more and less rapidly on a periodic basis. Such information could be useful in determining how other network parameters such as network traffic load, which likely varies over a day, week or year period, effects performance of network services.
Without such information, individual measurements may be misleading. For example, an unsatisfactory results of such measurements may be caused by high or low network traffic load, rather than a specific problem with a network device. Also, using the above described standard techniques, it can be difficult to provide any type of predictive event correlation. For example, what, if any is the effect of degradation of DNS lookup time on overall network service performance during specific time periods? Such predictive information can help providers of network services to set appropriate expectations of network performance for customers of such providers. Additionally, such predictive information can facilitate troubleshooting of root causes relating to network, application and third party content (e.g. banner ads on a web site) issues.
Further, in order to determine whether a particular network service is operating appropriately using the above described methods, a user must initiate measurement of one or more network performance metrics, retrieve and then analyze the result. That is, there is no way for a system that does no more than take measurements of network performance metrics to notify a user if a network is not operating correctly because there is no baseline or other reference available to the system to make such a determination.
What is needed is a system for measuring network performance metrics which allows a user to take into account network conditions, such a traffic load, when analyzing the measurement. Also, the system should allow a user to be able to make predictions about network performance at a given time. Additionally, such a system should be automated and should be able to analyze and present measurement results in a manner which is meaningful and straightforward to interpret.
A system and method in accordance with the present invention collects measurements of network performance metrics and automatically calculates and provides composite variance analysis of such metrics. The system and method can then use history of performance data statistics to alert a user about performance of network services that are outside acceptable tolerance or control limits. The technique exposes subtle deviation from accepted measurement tolerance that can, in turn, be categorized in relation to control limits based on defined standard deviation thresholds.
A system in accordance with the present invention includes at least one DCA located on a network, a processing module interconnected with the DCA, and, preferably, a comparison module interconnected with the processing module. The DCA collects at least a first plurality of measurements of a single network parameter and at least a first set of measurements including at least a single measurement of the single network parameter. Each of the first plurality of measurements is taken at a different time. The processing module calculates at least a first variance statistic, such as an average value, and a second variance statistic. The first variance statistic relates to the first plurality of measurements and the second variance statistic relates to the first set of measurements. The comparison module compares the first variance statistic with at least the second variance statistic to determine if a predetermined relationship exists between the first variance statistic and the second variance statistic. For example, the variance statistics could be averages of the group and first set of measurements. The comparison module could determine if the average of the first set of measurements is within a predetermined multiple of standard deviations from the average of the group of measurements. Preferably, the system also includes a screen display for displaying at least the first and second variance statistics and the results of the comparison thereof.
A method in accordance with the present invention includes collecting at a first plurality of measurements of a single network parameter, each measurement taken at a different time. Also, at least a first set of measurements is collected including at least a single measurement of the single network parameter. Then a first variance statistic associated with the first plurality of measurements and at least a second variance statistic associated with the first set of measurements are calculated. The first variance statistic is then compared with at least the second variance statistic to determine if a predetermined relationship exists the two variance statistics.