In recent years, the world has witnessed the proliferation of high-speed data networks and the rapid expansion of the set of protocols/services supporting these networks. The development of network monitoring and traffic measurement techniques have, so far, failed to catch up with the operating speed as well as the large-scale deployment of these networks. Because of this shortfall, network operators are increasingly losing their grasp on what exactly occurs in these networks. This, in turn, has jeopardized the ability to operate and manage the networks properly and efficiently. There is an urgent need of a comprehensive, yet deployable, network monitoring and end-to-end traffic analysis infrastructure for large-scale, high-speed networks. Such infrastructure is particularly important for connectionless data networks such as the Internet, in which routes of traffic flows can change dynamically and unpredictably in the middle of a session due to different types of expected or unexpected events. Such events include network component failures, non-deterministic load-balancing schemes (e.g. Equal Cost Multiple Path (ECMP)), software/hardware bugs and protocol mis-configurations. At present, most network operators can only rely on rudimentary diagnostic tools such as “traceroute”, to obtain woefully inadequate samplings of end-to-end routes of individual traffic flows within the network.
Recent research in traffic measurement/analysis methodologies and infrastructures has been strongly driven by the demands of a number of critical real-life applications such as the Origination-to-Destination (O-D pair) traffic matrix estimation for large scale ISPs and the support of traceback services in IP-based networks to tackle spoofed DDoS attacks. In the traffic matrix estimation problem as discussed in A. Medina, N. Taft, K. Salamatian, S. Bhattacharyya and C. Diot, “Traffic Matrix estimation: Existing Techniques and new directions,” in Procs. of ACM Sigcomm, August 2002 [Medi 02]; Y. Zhang, M. Roughan, N. Duffield, A. Greenberg, “Fast Accurate Computation of Large-Scale IP Traffic Matrices from Link Loads,” in Procs. of ACM Sigmetrics, June, 2003 [Zhang 03a]; and Y. Zhang, M. Rough, C. Lund and D. Donoho, “An Information-Theoretic Approach in Traffic Matrix Estimation,” in Procs. of ACM Sigcomm, August 2003 [Zhang 03b], the objective is to estimate traffic demands between O-D node-pairs in a large scale IP network using link-load measurements only. The origin of this problem stemmed from the lack of support of inexpensive, scalable per-flow counters by most commercial gigabit routers in the market. For example, while the Cisco Netflow technology, Cisco, IOS NetFlow. http://www.cisco.com/warp/public/732/Tech/nmp/netflow/index.shtml, can be used to collect fine grain per-flow traffic statistics, its formidable storage and bandwidth requirements make it unsuitable for 10 Gbps networks. To address such inadequacy/deficiency in the measurement infrastructure, researchers have resorted to combine link-load measurements with additional assumptions on O-D pair traffic distribution in order to estimate the required O-D pair traffic matrix. For instance, in [Medi 02, Zhan 03a, Zhan 03b], different variants of the gravity model are adapted from the field of transportation to model the network traffic distribution between all O-D pairs ; in [Vard 96] Y. Vardi, “Network Tomography: estimating source-destination traffic intensities from link data,” Journal of American Statistics Association, 91, pp. 365-377, 1996, [Vard 95], a Poissonian assumption is used to relate the 2nd order link-load statistics with O-D pair traffic distribution. Similar Gaussian assumption is made by J. Cao, D. Davis, S. V. Wiel and B. Yu, “Time-varying network tomography,” Journal of American Statistics Association, 95, pp. 1063-1075, 2000 [Cao 00] as well. In fact, the problem of estimating the O-D traffic matrix given only link-load measurements has led to the formation of a new field research called “Network Tomography”. Unfortunately, most of the network tomography-based solutions proposed to-date are highly sensitive, i.e. not robust, with respect to the validity of their traffic distribution assumptions. The tomography-based approach also heavily relies on the correctness, synchronization and consistency amongst multiple operational databases from which measurements/configuration information have to be extracted and collated. (Such databases include forwarding tables in the routers, the router configuration files, as well as SNMP MIBs for the link-load.) The aforementioned modeling and operational assumptions also render the tomography-based traffic measurement/estimation schemes of little use for network failure detection/diagnosis where neither the proper functioning of network elements/databases nor the normality of traffic distribution can be assumed.
Recently, an alternative packet trajectory-based traffic monitor/analysis approach has been proposed by N. G. Duffield, M. Grossglauser, “Trajectory Sampling for Direct Traffic Observation,” in Procs. of ACM Sigcomm, August 2000 pg. 271-282 [Duff 00] and A. C. Snoeren, C. Partridge, L. A. Sanchez, C. E. Jones, F. Tchakountio, S. T. Kent and W. T. Strayer, “Hash-based IP Traceback,” in Procs. of ACM Sigcomm, August 2001, pg. 3-14 [Snoe 01] in which each node (router) maintains a compressed summary, or a digest, of all the packets it recently handled. In both [Duff 00] and [Snoe 01], the digest is in the form of a Bloom filter, see, for example, B. Bloom, “Space/Time trade-offs in hash coding with allowable errors,” Communications of the ACM 13, July 1970, pp. 422-426, [Bloo 70] and , A. Broder, M. Mitzenmacher, “Network Applications of Bloom Filters: A Survey,” Allerton Conference, 2002, available at http://www.eecs.harvard.edu/˜michaelm , [Brod 02] which is updated for every packet arriving at the node and periodically uploaded to some centralized server to support future offline traffic analysis as well as archival purposes. Armed with these very informative nodal traffic digests, the centralized server can not only construct the traffic flow pattern and per-flow/commodity measurements throughout the network, but also answer queries regarding the end-to-end path, or the so-called trajectory, of any given packet traversing the network in the (recent) past. The ability of answering trajectory query for any given individual packet does come with a heavy cost: the Bloom filter has to be big enough to store sufficient information for every individual incoming packet. Even with the efficient memory vs. false-positive-trade-off of a Bloom filter, it still requires O(N) bits of memory to capture and correctly distinguish the signatures of N different packets with high probability. In [Snoe 01], it is estimated that such a system requires approximately 0.5% of link capacity of the node per unit time in storage. For a 10 Gbps link, this translates to 50 Mbits of storage for every one second of monitoring time. Such a heavy-weight traffic digest approach not only stresses the memory storage and communication requirements of the system but also scales poorly as the link speed and/or monitoring duration increases.