In the recent years, the Internet has emerged as the most important vehicle for the transport of information services. Most organizations connect to the Internet through Internet Service Providers (ISPs). The number of ISPs is continuously increasing and so is the number of networks connected to an individual ISP. This network complexity requires new administrative tools for network management which makes it necessary to study the various characteristics of the network, to study the utilization patterns of the links and to maintain an account of per user utilization of network resources. This has fueled the need to compute/estimate important metrics by monitoring Internet traffic at Internet nodes. These metrics could be any of the following:                Goodput        Throughput        Link utilization        Fraction of packet losses        Number of retransmitted packets        Duplicate packets        Round trip time (RTT)        
Goodput on a link is the total unique data in bytes transmitted per unit time over the link. Goodput excludes duplicate data generated due to packet retransmissions. Packet retransmission is common in protocols such as TCP and is caused by losses in the network.
Throughput on a link is the total number of bytes transferred per unit time over the link.
However, for protocols such as UDP, where no packet is retransmitted, throughput and goodput refer to the same quantity. Hence, for TCP traffic goodput is a more accurate measure of the effective link utilization than throughput.
Duplicate bytes on a link are transmitted more than once on that link. This happens because of retransmission of data, when data is lost in the network due to congestion or due to lossy links.
Losses refer to bytes that do not reach their destination. Losses occur in the network due to buffer overflow, queue management and admission control and on links due to errors in packet transmission over the links. A good estimate of the losses at different points in the network can help in efficient network administration.
A good estimation of the above metrics can be extremely useful in network management. These metrics are easily estimated at the end-nodes (i.e. the source node and the destination node) where all the information is available.
Existing Tools for Internet Data Analysis:
We discuss some important existing tools that monitor internet traffic and their limitations.
Ping [STEVENS] is an elementary tool which sends ICMP echo packets to a host and estimates the Round Trip Times (RTTs) and the losses of these ICMP packets. Built on similar lines, Traceroute [STEVENS] sends UDP probe packets with varying values of the Time-To-Live (TTL) field to find the entire path from the source to the destination and also the RTTs corresponding to each hop on the path.
Like Traceroute, Pathchar [PATHCHAR] takes advantage of the TTL field of the IP packet header to estimate the RTTs from the source to each hop on the path to the destination. Using these it then estimates the latency, bandwidth and queueing delays of each link on the path.
Limitations of Prior Art (Active):
One major drawback of the above tools is that they are active, i.e., they inject packets into the network to measure its state, thereby altering the state of the network. The act of observing the network should not directly interfere with or add to network activity. Secondly, the metrics which are estimated by these tools are solely on the basis of those packets which they inject into the network. These metrics, hence, may not reflect the true state of the network.
We now look at some existing passive tools. One of the most widely used tools for Internet protocol monitoring is tcpdump [TCPDUMP]. Tcpdump acquires network frames from the underlying filter and can either store these in the binary or output the frame's IP protocol header contents in ASCII. Tcpdump obtains a copy of the packet from Libpcap [LIBPCAP].
Libpcap is a utility developed for Linux and BSD which provides easy access to captured packets. The packets are actually obtained using the Berkeley Packet Filter (BPF) [BPF] which puts the Network Interface Card (NIC) in a promiscuous mode.
Paxson's TCPanaly [PAXSON] is an offline tool for analyzing TCP traces. It classifies TCP implementations based on characteristics seen in their traces. In order to classify a TCP connection, TCPanaly must make two passes over the data stream.
Coral Reef [CORAL] is distributed by Caida [CAIDA99] and is based on OC3MON. It captures low-level protocol headers over serial ATM networks trunks for post analysis. OC3MON is currently used for capturing IP, UDP and TCP headers at points in the vBNS network.
Windmill [WINDMILL], developed at University of Michigan, reconstructs application-level network protocols and exposes the underlying protocols' events. The packets are filtered by Windmill packet filter (WPF). Then a set of protocol modules extracts the various parameters.
Limitations of Prior Art (Passive):
None of the above passive tools calculate metrics such as goodput and duplicate packets. These parameters can be very useful in network management.