By way of background, the quality of paths taken for network traffic, such as Internet traffic, is critical for many network applications. For instance, the performance of online streaming applications directly depends on loss rate of the underlying network paths, the web browsing experience depends on network latency between client and server and file downloading time depends on network bandwidth. Given the importance of path quality, it would thus be desirable to consider path quality when performing route selection for applications. However, due to the distributed nature of current Internet infrastructure, obtaining path quality information is challenging.
Related in some sense, three metrics that have general applicability to packet-based networks with respect to quality of path for a given connection are network latency, loss rate and bandwidth.
Latency in a packet-switched network is measured either one-way, i.e., the time from the source sending a packet to the destination receiving it, or round-trip, i.e., the one-way latency from source to destination plus the one-way latency from the destination back to the source. Round-trip latency is more often used because it can be measured from a single point. It is noted that round trip latency excludes the amount of time that a destination system spends processing the packet. Many software platforms provide a service called ping that can be used to measure round-trip latency. Ping performs no packet processing, merely sending a response back when it receives a packet, i.e., performs a no-op, thus it is a relatively accurate way of measuring latency.
Packet loss in turn occurs when a large amount of traffic, also referred to as congestion, on the network causes dropped packets. As any user of video streaming applications has witnessed, when caused by network problems, lost or dropped packets can result in highly noticeable performance issues or jitter. Similar problems result with voice over internet protocol (VoIP) applications, online gaming applications, video-conferencing applications and so on. Since the nature of the information is lost, without inherent redundancy of data, packet loss will affect just about all other network applications as well.
Some network transport protocols, such as transmission control protocol (TCP), by their very nature provide for reliable delivery of packets. For instance, with some reliable network protocols, in the event of packet loss, the receiver asks for retransmission, or, the sender automatically resends any segments that have not yet been acknowledged by the receiver. In certain highly reliable variants of TCP, if a transmitted packet is lost, it is re-sent along with every packet that had been sent after it. This retransmission causes the overall throughput of the connection to drop. Thus, although TCP can recover from packet loss, there is a tradeoff between retransmitting missing packets and maintaining throughput levels.
At the other end of the sliding scale or spectrum of network protocols that exist are unreliable protocols, such as user datagram protocol (UDP), which are faster because they provide no recovery for lost packets. Applications that use UDP must handle or make decisions about the packet loss condition on their own. Thus, under some circumstances, there is a tradeoff between throughput and packet loss.
Along with loss rate and network latency, the concept of “bandwidth,” also referred to as “throughput,” is central to digital communications, and specifically to packet networks, as it relates to the amount of data that a link or network path can deliver per unit of time. In this regard, bandwidth quantifies the data rate that a network link or a network path can handle or transfer. Measurement of network bandwidth is of increasing importance for many Internet applications and protocols, especially those involving the transfer of large files and those involving the delivery of content with real-time quality of service (QoS) constraints, such as streaming media.
For many data intensive applications, such as file transfers or multimedia streaming, there is no doubt that the bandwidth available to the application directly affects application performance. In general, existing bandwidth estimation tools measure one or more of three related metrics: capacity, available bandwidth, and bulk transfer capacity (BTC). One conventional system estimates the capacity of each link on a network path, measuring the data transmission time on each link, by taking the difference between the round trip times (RTTs) from a source link to two adjacent routers. However, to filter out measurement noises due to factors such as queuing delay, a large number of probing packets are required in order to find the smallest RTT values for the final calculation. Consequently, such conventional technique has been observed to have an unacceptably large probing overhead.
With respect to other attempts, one way of classifying bandwidth estimation techniques is based on whether they conduct hop-by-hop or end-to-end measurements. Hop-by hop techniques rely on incrementally probing routers along a path and timing their Internet Control Message Protocol (ICMP) replies, whereas end-to-end techniques base their bandwidth estimation on end-host replies only. Another classification of bandwidth measurement techniques is based on whether they measure the bottleneck bandwidth or the available bandwidth of a path.
Also, two different measures used in end-to-end network bandwidth estimation are bottleneck bandwidth, or the maximum transmission rate that could be achieved between two hosts at the endpoints of a given path in the absence of any competing traffic, and available bandwidth, the portion of the bottleneck bandwidth along a path that could be acquired by a given flow at a given instant in time. Both of these measures have independent applicability in that each captures different relevant properties of the network. For instance, bottleneck bandwidth is a static baseline measure that applies over long time-scales, up to the time-scale at which network paths change, and is independent of the particular traffic dynamics at a time instant.
Available bandwidth in turn provides a dynamic measure of the load on, or residual capacity of, a path. Additional application-specific information is then applied before making meaningful use of either measure. Given the nature of the Internet, the latter problem of determining available bandwidth in a real-time manner is a challenging one.
Currently available bandwidth estimation tools employ a variety of strategies to measure these metrics. A network manager with administrative access to the router or switch connected to a link of interest can measure some bandwidth metrics directly. Specifically, a network administrator can simply read information associated with the router/switch, e.g., configuration parameters, nominal bit rate of the link, average utilization, bytes or packets transmitted over some time period) using a network management protocol. However, such access is typically available only to administrators and not to end users.
On the other hand, without any information from network routers, end users can only estimate the bandwidth of links or paths from end-to-end measurements. Even network administrators sometimes need to determine the bandwidth from hosts under their control to hosts outside their infrastructures, and so they rely on end-to-end measurements too. Some conventional systems have measured these bandwidth-related metrics on a network link or on an end-to-end path.
Current techniques used to detect bottleneck positions have problems such as high probing overhead and low measurement accuracy. In another conventional system, Recursive Packet Trains (RPT) are used to detect network congestion position. RPT combines two types of probing packets—measurement packets and load packets—in a single probing packet train. The idea is to let load packets generate a packet queue on the router, and to use the measurement packets at the beginning and the end of the train to measure the packet train length. By detecting the changes in the packet train length, the congestion points of the network path are derived.
The most widely used active probing tools are ping and traceroute. Ping uses an ICMP echo packet to measure the round-trip time (RTT) to a specific destination. Traceroute sets a time to live (TTL) in the internet protocol (IP) header to trigger responses from the routers along the network path, thus collecting the hostname and RTT of the routers. However, the only performance information provided by these tools is RTT, which is not the whole story when it comes to congestion.
Bandwidth estimation techniques, specifically available bandwidth estimation algorithms, measure network throughput, which is more closely related to congestion. However, they provide no location information for the congestion point and need the cooperation of the destination. That makes them very hard to deploy.
In sum, the only viable solution to measure Internet path quality thus far has been through end-to-end probing. Although conceptually simple, there are a few notable disadvantages with end-to-end probing. First, without router support in conventional end-to-end probing systems as described above, inaccuracies result, i.e., such techniques depend on router support for accuracy. It would thus be desirable to free network endpoints from having to involve routers in order to obtain information about path quality.
Moreover, end-to-end probing between all pairs of hosts is not scalable. In short, to have information available about all paths in a network according to convention end-to-end probing techniques, measuring endpoint to endpoint in a network as large as the Internet including N endpoints involves approximately N squared endpoint to endpoint path measurements.
Some conventional end-to-end probing have involved an intermediate host. Comparing communications from a source endpoint to a target endpoint directly, with communications that reach target endpoint indirectly through the intermediate host, via triangulation principles, can also help an endpoint measure path quality, however, the addition of another host only increases overhead where it is unacceptably high.
Multi-homing techniques have attempted to improve path quality by adding alternate paths to a target endpoint via an intermediate endpoint, however, these solutions depend on use of the intermediate endpoint, add to network traffic, and do not serve to give endpoints a holistic view of path quality to the services they use. In this regard, path selection today is limited to capabilities of Internet Service Providers (ISPs) and their respective points of presence (PoPs). In this respect, each endpoint is limited by paths through respective ISPs. In some cases, two ISPs have cooperated to allow traffic through each other's nodes under a sharing agreement, which tends to improve bandwidth overall since one set of nodes may offer better path quality than another set of nodes at a given moment. In effect, this can help balance out congestion points by offloading traffic to another ISP. However, such micro cooperation among ISPs, while beneficial to the Internet user community, does not help endpoints understand path quality.
Accordingly, as the above survey of conventional systems illustrates, obtaining path quality information by end users is extremely challenging in the current Internet. Thus, simpler techniques for characterizing path quality between any pair of hosts on the Internet are desirable. Given the importance of path quality, it is also desirable to be able to consider path quality when performing route selection for network applications. The above-described deficiencies of path quality information gathering techniques are merely intended to provide an overview of some of the problems of conventional systems, and are not intended to be exhaustive. Other problems with the state of the art and corresponding benefits of the invention may become further apparent upon review of the following description of various non-limiting embodiments of the invention.