1. Field of the Invention
The present invention relates to traffic analysis in a network.
2. Description of the Related Art
A database is a collection of information. Relational databases are typically illustrated as one or more two-dimensional tables. Each table arranges the information in rows and columns, with each row corresponding to a record and each column corresponding to a field. In a relational database, a collection of tables can be related or joined to each other through a common field or key, which enables information in one table to be automatically cross-referenced to corresponding information in another table.
A complex search may be performed on a database with a query. A query specifies a set of criteria (e.g., the quantity of parts from a particular transaction) to define identified information for a database program to retrieve from the database. An aggregate query is a query that requests information concerning a selected group of records. For example, in a database which stores sales transactions, an aggregate query may request the total quantity of an item in a particular transaction. Each aggregate query may include a set of criteria to select records (e.g., grouping of records by an item code field and a transaction code field), and an operation to perform on the group of selected records (e.g., summing the quantity fields). Typical operations for aggregate queries include counting, summing, averaging, and finding minimum and maximum values.
To perform an aggregate query, a conventional database program examines every record in the database to determine whether or not the record matches any criteria and constructs a query table from the records that match the criteria. Then the program performs the required operation over the appropriate fields from each record in the query table.
Massive data streams are increasingly prevalent in many real-time applications, such as web applications, Internet-traffic monitoring, telecommunication-data management, financial applications, and sensor networks. Often, the data streams in these applications are distributed across many locations, and it is important to be able to answer aggregate queries that pool information from multiple locations. Given continuous data feeds to support real-time decision making in mission-critical applications, such as fraud and anomaly detection, these queries are typically evaluated continuously, in an online fashion. For example, in a high-speed network with many nodes, packet streams arrive at and depart from the nodes on a continuous basis. A quantity that is of importance for many network-management applications, such as optimization and fault management, is a traffic matrix, which is a representation of the volume of traffic (typically in packets or bytes) that flows between origin-destination (OD) node pairs in a communication network during a measurement interval. A traffic matrix varies over time, and a sudden change may indicate an underlying anomaly.
In some circumstances, such as the monitoring of network traffic that includes high-speed and/or high-volume data streams, aggregate querying, as performed by conventional database programs, may be unacceptably slow. In such circumstances, exact computation for aggregate queries can be difficult to carry out, due to large memory requirements.
The term “set expression” refers to an expression that defines a set of data elements and is made up of set identifiers (i.e., names of sets) and set operations (such as complements, unions, intersections, and differences) performed on those sets. Each data element may be, e.g., an individual byte of data or a record containing multiple bytes of data. The terms “stream expression” and “data stream,” as used herein, refer to a set expression defined over multiple streams (such as streams of data passing through different nodes of a network), where each stream is considered as a set of elements. Since, in a given stream expression, elements may appear more than once, the term “stream-expression cardinality” refers to the number of distinct elements in a stream expression.
For example, in the Venn diagram of FIG. 3, where T1 and T2 represent two different stream expressions, the cardinality of T1 is 1 (i.e., T1 contains 1 distinct element), and the cardinality of T2 is 2 (i.e., T2 contains 2 distinct elements). The cardinality of the stream-intersection set T1∩T2 is 0, since there are no elements common to both T1 and T2, and the cardinality of the stream-union set T1∪T2 is 3.