The “best-effort” nature of the Internet makes the QoS (Quality of Service) perceived by end users unpredictable and sometimes largely varying. Fast, accurate and efficient tools for estimating QoS performance of IP networks are gaining importance in the networking community. This is because such information can be used to maintain expected user and provider service under the varying conditions inherent to packet networks, especially the Internet. Specific applications include congestion control, real-time streaming and two-way communication, QoS verification, server selection and network administration.
QoS estimation can be broadly classified into two categories: passive monitoring and active monitoring. The passive monitoring approach has the advantage of not injecting additional probing traffic into the network. It observes the network as it is, meaning that the measurements are an assessment of true network behavior since this latter is not disturbed by probing traffic intended for those measurements.
The monitoring can take different levels of granularity depending on the degree of processing, storage and resources available. Packet monitoring for example allows observation of packet-by-packet information such as packet delay variation, packet size distribution, and throughput between host pairs. Higher level, with less overhead, can be achieved by flow level measurements to record the total number of bytes transferred, the flow start and finish time, among others.
The main advantage of passive probing techniques is that they do not introduce a load on the network they monitor, which also means they do not distort the network traffic and therefore produce realistic estimates. However their handicap is that they rely on existing traffic, which is not guaranteed to have desired characteristics for certain measurements. Bottleneck bandwidth measurement techniques for example require a certain packet size distribution and inter-packet departure rate often not met. Traffic monitoring consists in passively observing traffic characteristics for the purpose of inferring network performance.
SNMP (Simple Network Management Protocol) and RMON (Remote Monitoring) are the most widely adopted standards for passive monitoring and typically consist of management agents or probes installed at various network elements (hosts, routers, switches), a MIB (management information base) containing collected data from the agents, a management station or console, which collects the information from the probes, and a protocol for the exchange of information between stations and probes. MIBs comprise several groups such as statistics collected at the physical or IP layers for packet sizes, CRC errors, and so forth. Traffic monitoring with administrative control requires the transfer of collected information from agents to consoles, thus placing a burden on the network being monitored. Sometimes, sampling of data in MIBs can be used to reduce the amount of traffic exchanged.
Active monitoring is the inference of network QoS by sending probes across the network and observing the treatment they receive in terms of delay in delivery to the destination, variability in that delay and loss. A large variety of such tools exist to estimate performance in terms of delay, jitter, packet loss, and bandwidth. They generally either use Internet Control Message Protocol (ICMP) error messaging capabilities or packet dispersion techniques.
Link capacity estimation in bits/sec has traditionally been achieved through the use of packet dispersion techniques, which consist in the successive transmission of groups of two or more packets. The concept is that packets from the same group will queue one after another at the bottleneck link of the path. With the absence of large interference from competing traffic (i.e. from other sources), the dispersion (i.e. the difference in packet arrival times at the receiver) will be inversely proportional to the bandwidth of the bottleneck. Examples of tools using this approach include Nettimer, Pathrate and Packet Bunch Mode (PBM).
Another approach for capacity estimation builds on the ICMP Time Exceeded message option. Pathchar, the precursor of this technique, performs measurements by sending packets with increasing IP Time-to-Live (TTL) values thus forcing routers along the path to send back ICMP error messages and revealing themselves. A measurement of round-trip delays to successive hops on the end-to-end path thereby leads to per hop delay estimation. Pathchar also adds the feature of varying packet sizes for each TTL value, thus inferring link capacity as the slope inverse of the line connecting minimum observed delay for each packet size. Other tools exist like Pchar and Clink but they build on the same concepts as Pathchar.
Cprobe and Pipechar were the first tools proposed to estimate available bandwidth on a path. These tools use long packet train dispersion and assume that dispersion of such trains is inversely proportional to the rate available for transmission at the bottleneck hop, i.e., available bandwidth. Recent research, however, has shown that the inverse of such dispersions does not in fact measure available bandwidth but another parameter referred to as ADR (Asymptotic Data Rate).
Another tool, Delphi, assumes Internet paths can be modeled by a single queue, which makes it perform badly in the presence of large queuing delays at several links on the path or when the bottleneck capacity and bottleneck available bandwidth links are located at different links.
At the present time, there is only one known tool that is capable of estimating available bandwidth, which is Pathload. It builds on the simple principle that the observed end-to-end delay at the receiver increases when the transmission rate at the source exceeds the available bandwidth on the path; a realistic observation knowing that traffic injected onto the path faster than the bottleneck can service will cause queue build up at that hop, hence increasing queuing delay and delay altogether.
Pathload operates by performing several iterations, varying the transmission rate at each and observing delay variation at the receiver. The point is to find the maximum rate that does not cause delay increase.