Many communication networks, such as the Internet, rely on packet switching technologies (e.g., X.25, frame relay, asynchronous transfer mode, etc.) to transport variable or uniform blocks (usually termed packets or cells) of data between nodes. The term packet will be used herein to collectively refer to any such block of information. In essence, a packet switched network is a network of queues communicatively coupled together by communication links (which may be made up of various physical media). At each network node (e.g., a switch or router), there exist one or more queues of packets for each outgoing link. If the rate at which packets arrive and queue up exceeds the rate at which packets are transmitted, queue size grows without bound and the delay experienced by a packet tends towards infinity.
In an ideal case, network throughput, and hence network use, should increase to an offered load up to the physical capacity of the network and remain at capacity if the load is further increased. This ideal case, however, requires that all nodes somehow know the timing and rate of packets that will be presented to the network with no overload and no delay in acquiring this information; a situation which is not possible. If no control is exercised, as the load increases, use increases for a while. Then, as the queue lengths at various nodes begin to grow, throughput actually drops. This is due, in part, to the retransmission of dropped packets, and it is common for this condition to be described as “congestion”. It is clear that catastrophic network failures due to congestion should (indeed, must) be avoided and preventing such failures is the task of congestion control processes within packet switched networks. As a starting point for such processes, however, one must be able to determine when and where congestion is occurring.
Any attempt to measure congestion (which for purposes of this discussion shall be regarded more formally as anomalous deviations in the end-to-end response time or duration of a connection) necessarily requires the gathering of some network performance information. This raw information may relate to a variety of network “metrics” as defined by the Internet Engineering Task Force (IETF) in a series of Requests for Comments (RFCs) as follows:                a. RFC 2330, entitled “Framework for IP Performance Metrics” (May 1998), define a general framework for particular metrics to be developed by the IETF's IP Performance Metrics effort, begun by the Benchmarking Methodology Working Group (BMWG) of the Operational Requirements Area, and being continued by the IP Performance Metrics Working Group (IPPM) of the Transport Area.        b. RFC 2678, entitled “IPPM Metrics for Measuring Connectivity” (September 1999), defines a series of metrics for connectivity between a pair of Internet hosts. It builds on notions introduced and discussed in RFC 2330, the IPPM framework document.        c. RFC 2679, entitled A One-way Delay Metric for IPPM” (September 1999), defines a metric for one-way delay of packets across Internet paths.        d. RFC 2680, entitled “A One-way Packet Loss Metric for IPPM” (September 1999), defines a metric for one-way packet loss across Internet paths.        e. RFC 2681, entitled “A Round-trip Delay Metric for IPPM” (September 1999), defines a metric for round-trip delay of packets across Internet paths.        f. A draft RFC entitled “IP Packet Delay Variation Metric for IPPM” (April 2002) refers to a metric for variation in delay of packets across Internet paths. The metric is based on the difference in the One-Way-Delay of selected packets. This difference in delay is called “IP Packet Delay Variation”.        g. A draft RFC entitled “One-Way Loss Pattern Sample Metrics” (March 2002) uses the base loss metric defined in RFC 2680 to define two derived metrics, “loss distance” and “loss period”, and the associated statistics that together capture loss patterns experienced by packet streams on the Internet. The authors postulate that the loss pattern or loss distribution is a key parameter that determines the performance observed by the users for certain real-time applications such as packet voice and video. For the same loss rate, two different loss distributions could potentially produce widely different perceptions of performance.        h. A draft RFC entitled “Network Performance Measurement with Periodic Streams” (April 2002) describes a periodic sampling method and relevant metrics for assessing the performance of IP networks.        
Regardless of the metric of interest, the volume of information obtained generally requires that it be analyzed using statistical tools in order to arrive at conclusions about the network's performance. One problem with relying on statistical measures of network performance parameters, however, is that such data can be highly influenced by so-called outliers. Outliers are generally regarded as observations that deviate so much from other observations of the same dataset as to arouse suspicions that they were generated by a different mechanism. See, e.g., Edwin M. Knorr and Raymond T. Ng., “Algorithms for Mining Distance-Based Outliers in Large Datasets”, Proc. 24th VLDB Conf. (New York 1998). Thus, it is often necessary to eliminate such outliers from the dataset before subjecting the remaining data to analysis.
Most statistical tests that have been developed to identify outliers for a given variable are constrained to data for that variable. For example, in the case of duration outliers for Internet traffic, conventional statistical tests are performed using only a time series of such durations. This inherently limits the accuracy of determining a baseline from which an outlier is defined relative to. Examples of such statistical tests include Grubb's Test, Rosner's Test and Walsh's Test, all of which are too conservative when applied to Internet traffic data. That is, these tests fail to recognize outliers that one can qualitatively identify. However, if these duration outliers could be eliminated by independent measurements, then the remaining data could then be used to accurately establish a baseline. This baseline then defines the limit of what is considered a non-outlier. Indeed, this is what the present inventors have done.