Businesses and individuals rely upon networks (e.g., the Internet) for communications and the exchange of data. Computers coupled to these networks allow users to readily gain access to and exchange data of all types (e.g., sound, text, numerical data, video, graphics, multi-media, etc.) with other computers, databases, websites, etc. This enables users to send and receive electronic mail (e-mail) messages, browse web sites, download files, participate in live discussions in chat rooms, play games in real-time, watch streaming video, listen to music, shop and trade on-line, etc. With increased network bandwidth, video-on-demand, HDTV, IP telephony, video teleconferencing, and other types of bandwidth intensive applications will become prevalent.
But in each of these applications, the underlying technology is basically the same. The data is first broken up into several smaller “packets.” The data packets are then individually routed through one or more networks via a number of interconnected network devices. The network devices, such as routers, hubs, and/or switches, direct the flow of these data packets through the network to their intended destinations. And depending on the degree of complexity, one or more dedicated network administrators use specialized network management systems to provision, troubleshoot, monitor, profile, and otherwise keep the network operating at peak efficiency.
Ideally, the exact network conditions can be evaluated by examining the packets as they are being routed through the network. These packets give invaluable information which is essential in analyzing network performance. Unfortunately, monitoring and examining each and every packet is quite costly. Resources must be dedicated for metering, storing, transporting, and processing the data. Furthermore, some network management techniques require the capturing of packet headers or even parts of the attendant payload. And as network speed and bandwidth continue to increase, the amount of data being carried over the networks threatens to overwhelm even the most sophisticated network management system.
In an effort to minimize the costs and overhead associated with network management, while at the same time, preserving measurement accuracy, many network management systems have adopted sampling techniques. Rather than examine each and every packet being carried over the network, a small set of selected packets are captured, examined, and analyzed; the vast majority of packets are not examined. The rationale is that the smaller set of “sampled” packets are representative of the overall network traffic. One can deduce and extrapolate the general network conditions by evaluating the small set of sampled packets. In a way, “sampling” used in the context of network management applications is analogous to conducting “surveys” or taking “polls.”
One popular sampling technique is known as “n-out-of-N.” In an n-out-of-N packet sampling scheme, “n” elements are selected out of a parent population that consists of “N” elements. Applied to network management systems, a smaller set of “n” packets are sampled from a given, larger set of “N” packets. Unfortunately, systematically selecting “n” numbers of consecutive packets with a count period of “N” may generate biased results. Under some circumstances, systematic n-out-of-N might not catch unusual network activity, especially if the anomaly is periodic or quasi-periodic and happens to fall outside the sampling window. One way to minimize biasing the results, entails randomly selecting the position of the first packet to be sampled. Although this reduces the probability of biasing, it does not completely resolve the biasing problem. For example, a hacker may take advantage of known sampling schemes and design algorithms to exploit or otherwise circumvent the sampling being conducted. Thus, there exists a need for a sampling scheme which minimizes biasing, retains accuracy, and yet, is cost effective.