1. Field of the Invention
This invention generally relates to systems and methods for characterizing the performance of computer networks. More specifically, this invention relates to a process that provides loss and delay analysis of internal network links using only measurements “at the edge” of the network.
2. Description of the Related Art
Network performance information can be extremely useful, but for most applications it is important that the information be localized. Knowing how portions or individual components of the network are performing is more valuable than generating a global measure of performance. In particular, the ability to identify the performance of localized portions of the network would be useful for a number of increasingly important networking tasks, including Quality of Service (QOS) verification, maintenance, and content delivery optimization.
Quality of Service Verification
It is now common for Internet service providers to offer a variety of service levels to customers. Service level agreements specify performance criteria that the network provider guarantees to satisfy. Such criteria can include the amount of bandwidth made available to the customer and bounds on the maximum delay (which is important for Internet telephony and streaming applications). However, when a customer communicates via the Internet often a significant portion of the network connection is not under the direct control and responsibility of the service provider. If the customer experiences poor performance, it is difficult to determine whether it is due to the service provider's portion of the connection or the Internet at large. The only way to separate these effects and verify service is to assemble network performance information that is localized to the service provider's network. Unfortunately, directly measuring local network performance is very expensive for service providers. Furthermore, no existing techniques allow customers or third-parties to independently collect such information. It would be desirable to provide a method for an independent party to verify the service level without the cooperation of the provider and to provide a cost-effective means by which the service provider can track the performance of their system.
Maintenance and Provisioning
Maintenance of a network is a major portion of the effort involved in owning and operating a network. When the performance of a network is poor, it can be very difficult to isolate the cause of the problem. Sometimes a router is performing sub-optimally; on other occasions, too much network traffic is directed along one path while other paths remain idle. Furthermore, as networks grow in complexity it is often the case that network owners may not be aware of all components in their system, and consequently methods for mapping the topology of a network are critical. It would be desirable to have a way to determine the topology and connectivity of the network, to localize poor performance to individual network components, to rapidly identified faulty components, to optimize routing decisions, and to perhaps even indicate where additional network resources are required. The overall benefit is that maintenance overhead would be significantly reduced.
Content Delivery
Delivery of high-bandwidth content such as video poses a challenging resource-allocation problem. The source of the content must attempt to optimize the quality of content received by all users while minimizing the total network bandwidth that the content distribution consumes. Optimal bandwidth allocation accounts for the local loss-rates and delays in the network connecting the source and users. It would be desirable to have a system that can estimate local network performance and that can inform the content source of the loss rates and delays experienced at individual routers in the network.
Security
Detection of network intrusion or misuse is extremely challenging. Most techniques are in their infancy, but it appears clear that many intrusions can be detected by the abnormal traffic patterns they generate. For example, rapid increases in the correlation of delay behavior in local network neighborhoods can be indicative of denial-of-service attacks. By conducting on-line monitoring of delay and loss behavior, rapid determination of the source of the attack becomes a much more feasible task. A system that can localize pathological network performance to individual components or subnetworks could aid in the early warning and detection of attacks and intrusions.
A system and method that determines network topology and localized performance measurements would preferably be based on “edge” measurements. Edge measurements are measurements made at the source and receivers, i.e. at the “edge” of the communications network. Conducting direct measurements at internal network points to acquire localized information is an expensive and in many cases an impractical task. Because internal routers operate at such high speeds and carry so much traffic, internal measurement demands special-purpose hardware devices dedicated to the collection of the traffic statistics. As the size of the analyzed network increases, the number of measurement devices grows exponentially. The installation and maintenance of these devices are extremely time-consuming and costly exercises. Moreover, organizing the transmission of the statistics that these devices record to a central processor is complicated, and the transmission of statistics consumes additional network resources.
Whilst measurement throughout a network is infeasible, measurement at the edge of the network is a much more tractable and low-cost task. There are far fewer sites at which measurement must be made, and perhaps more importantly, the measurement can often be performed in software. Techniques that rely only on edge-based measurement would allow independent performance monitoring to be performed, because measurement at the edge of the network does not require cooperation from the owner of the network.
In large-scale networks, end-systems cannot rely on the network itself to cooperate in characterizing its own behavior. This has prompted several groups to investigate methods for inferring internal network behavior based on end-to-end network measurements: the so-called network tomography problem. See R. Caceres, N. Duffield, J. Horowitz, and D. Towsley, “Multicast-based inference of network-internal loss characteristics,” IEEE Trans. Info. Theory, vol. 45, November 1999, pp. 2462-80; C. Tebaldi and M. West, “Bayesian inference on network traffic using link count data,” J. Amer. Stat. Assoc., June 1998, pp. 557-76; S. Vander Wiel, J. Cao, D. Davis, and B. Yu, “Time-varying network tomography: router link data,” in Proc, Symposium on the Interface: Computing Science and Statistics, (Schaumburg, Ill.), June 1999; Y. Vardi, “Network tomography: estimating source-destination traffic intensities from link data,” J. Amer. Stat. Assoc., 1996, pp. 365-77; “Multicast-based inference of network-internal characteristics (MINC),” gaia.cs.umass.edu/minc; S. Ratnasamy and S. McCanne, “Inference of multicast routing trees and bottleneck bandwidths using end-to-end measurements,” in Proceedings of INFOCOM '99, (New York, N.Y.), March. While promising, these methods require special support from the network in terms of either cooperation between hosts, internal network measurements, or multicast capability. Many networks do not currently support multicast due to its scalability limitations (routers need to maintain per group state), and lack of access control. Moreover, multicast-based methods may not provide an accurate characterization of the loss rates for the traffic of interest, because routers treat multicast packets differently than unicast packets.
Accordingly, it would be desirable to provide a network tomography method that does not require special support from the networks and which provides a more accurate characterization of normal network behavior. Such a method would preferably be straightforward to implement and would be scalable.