1. Field of the Invention
The present invention relates to traffic analysis in a network.
2. Description of the Related Art
Massive and distributed data streams are increasingly prevalent in many modern applications. In a backbone Internet-Protocol (IP) network composed of hundreds or even thousands of nodes, packets arrive at and depart from the nodes at very high speeds. In a web content-delivery system composed of many servers (such as Akamai), user requests for accessing websites are distributed among the many servers based on the location of the user and current server loads. Other application domains that give rise to these massive and distributed streams include financial applications and sensor networks.
Due to their massive and distributed nature, answering queries about these data streams poses a unique challenge. Often, exact-query answering is infeasible due to memory requirements and communications overhead. In this scenario, approximate-query answering, which can provide probabilistic guarantees, becomes the only viable option. One of the most fundamental query classes of interest is the estimation of the number of flows in such streams.
As a first example, in the context of IP-network management, the number of distinct flows in a network sharing the same characteristics is of high interest to network operators, where a packet flow is defined as, e.g., a sequence of packets that have the same 5-tuple (a logical construct containing five parameters used to identify the connection and allowing network packets of data to be communicated between a server process and a client process in a bi-directional fashion), the same IP addresses/ports of the two communicating peers, and the same protocol. Moreover, the flow ID of a packet can be derived from the 5-tuple. The number of distinct flows between a node pair x and y, which is a type of traffic matrix element, can be formulated as the number of flows of
            T      ⁢              x        x              ⋂          T      ⁢                          ⁢              x        y              ,where
  T  ⁢      x    x    ⁢          ⁢  and  ⁢          ⁢  T  ⁢          ⁢      x    y  are the numbers of streams of packet-flow IDs seen at nodes x and y, respectively. Such traffic matrix elements can be used by network operators for network provisioning and optimization.
A second example is the total number of distinct flows to the same destination node i, i.e., Ui, where  are the streams of packet-flow IDs to node i. A significant increase in Ui may indicate an underlying network anomaly, such as a Denial of Service (DoS) attack.
The term “set expression” refers to an expression that defines a set of data elements and is made up of set identifiers (i.e., names of sets) and set operations (such as complements, unions, intersections, and differences) performed on those sets. The term “stream expression” refers to a set expression defined over multiple streams (such as streams of data passing through different nodes of a network), where each stream is considered as a set of elements. Since, in a given stream expression, elements may appear more than once, the term “stream-expression cardinality” refers to the number of distinct elements in a stream expression. For example, in the Venn diagram of FIG. 3, where T1 and T2 represent two different stream expressions, the cardinality of T1 is 1 (i.e., T1 contains 1 distinct element), and the cardinality of T2 is 2 (i.e., T2 contains 2 distinct elements). The cardinality of the stream-intersection set T1∩T2 is 0, since there are no elements common to both T1 and T2, and the cardinality of the stream-union set T1∪T2 is 3.