In recent years, advances in computer and network technology have led to the growth of networks providing fast and inexpensive access to information resources throughout the world. Networked computers are used at work, at home, and at school to share information and other content. Businesses use computer networks to deliver software, content, and services as well as to advertise and offer goods and services for sale. Consumers and businesses may select and purchase a variety of goods and services on-line. For example, books, clothing, electronics, and automobiles can be purchased on-line through vendor web sites. Likewise, financial services (such as stock trading, banking, and portfolio management), travel services, and news and information services, among others, are available on-line. Equally important, but perhaps not as evident to consumers, are a variety of on-line network services that support on-line businesses, such as transaction processors, security services, and e-mail service providers. Regardless of the type of goods and services provided, all on-line business must be concerned with the performance of their network operations. E-businesses unable to provide consistent, high performance are not likely to survive.
Individual businesses or entities that provide network applications have different network performance concerns. For example, a company may hire a Web service provider to provide its on-line services. The company may wish to conduct periodic audits to ensure that the Web service provider is fulfilling its obligation to provide satisfactory service. Similarly, the Web service provider may wish to validate the level of service quality it is providing and identify any problems that may exist or that may arise. Network management tools may be used to collect data from various locations and at various times. For example, the tools may collect connect times, download times for individual pages, domain name server (DNS) look-up times, and error messages, among other things. In fact, thorough network management tools may monitor tens or hundreds of parameters for a given website.
The data streams generated by the network management tools may be analyzed to evaluate network performance and detect and diagnose problems. However, analysis becomes difficult in practice because of the volume of information collected, variations in network usage, changes in the network, changes in equipment connected to the network (e.g., an increase in the number of servers used to meet network demand and/or installation of more efficient or faster servers), and changes in web sites, to name just a few.
In addition, Internet traffic (data transfer) is a poorly understood process. The current consensus is that the statistical nature of the data traces is fundamentally different from classical settings (e.g., Poisson-type processes), but the true nature of these processes remains elusive. The literature is flooded with contradictory empirical and theoretical studies, further contributing to the confusion. Given this environment, simply setting thresholds for tens or hundreds of parameters is arbitrary, inaccurate, and difficult in practice given the volume of information collected. For example, an arbitrarily set threshold may result in the detection of too many errors or too few errors. While the description above is primarily directed to network application, it should be appreciated that data streams can be generated by a variety of detectors, sensors, or other sources and that analysis of those data streams may also be desired. There remains a need for a system for analyzing data streams to detect abnormalities that is accurate and efficient and that can be run automatically without significant human intervention.