Investigating traffic flow can take a lot of processing time and power to monitor and classify, and both the amount and speed of traffic data, especially Internet traffic data, are ferociously increasing. Systems for traffic flow analysis very often encounter several obstacles, which take place at the level of the traffic flow passage due to various types of heavy processing required in order to obtain a semantic, reliable, and useful classification and processing of network traffic.
Classification of traffic travelling around a data network makes it possible to decide on behaviours to be adopted for each traffic flow as a function of its classification. That is, before a data packet can be adequately processed, classification of the traffic flow permits the network components to classify the data packets according to the various characteristics of the packets and information contained in the packet. Thus, accurate and efficient data processing depends largely on reliable methods of packet classification. After the packet is classified, the network components can determine how to properly handle and process the packets.
For example, in a firewall, a security system setup generally relies on recognition of protocol properties to prevent certain transfers, and in devices for managing quality of service, such devices allocate priorities to data as a function of complex rules which describe various scenarios. A correspondence between these scenarios and data packets conveyed within connections uses techniques for classifying these connections.
Again, the operations for controlling and managing networks require classification of connections between various senders and receivers which generate digital data streams over these networks. This requires powerful and reliable methods of classification, and thus traffic analysis.
Furthermore, analysis and classification of packets often involve the complex task of constructing protocol attributes, i.e., determining the ordered sequence of protocol names used in the semantic stream of data and the parameter names carried by a protocol. Building such a graph or knowledge base to recognize different protocols is a very heavy task because of the increasing numbers of new protocols used in packet communication networks, as well as the number of protocol modifications and new dependency links.
Typically, a data packet observation task is assigned to a node of the network such as, for example, a proxy server where connections pass through, which generate these data packets. Thus, existing traffic flow analyses are generally performed in computer networked systems such as the one illustrated in FIG. 1. A traffic analyzing system for analyzing high-speed traffic (e.g., packets or datagrams) between various computers, includes a first network 100 connected to a second network 110 using a communications link 200. The link 200 is analyzed by an analyzer 300, which measures and analyzes the traffic flowing in both or either direction between the first network 100 and the second network 110. The traffic between the network 100 and network 110 is usually 1 Gbps in business networks but can be as high as a couple dozen Gbps in the core of an operator's network.
As mentioned above, in heavily trafficked networks, to continuously analyze all traffic in an accurate and precise manner is a tall order. The capacity for analysis and measurement of the analyzer 300 is determined by the number of simultaneous flows N (e.g., traffic flow size) and the throughput T of each flow (e.g., traffic flow speed). N directly affects the amount of memory required to manage the context of the registered applications, whereas T directly impacts the processing power required to perform the analysis without significant loss of packets. T defines the quantity of packets to be processed in a lapse of time and defines, as a result, the amount of processing that can be allocated to each packet.
In known systems, the amount of processing increases proportionally with the increase of the flow N, insofar as each packet contributes to the state of a flow and thus imposes a structure of data whose size is related to N. It is thus clear that a given material infrastructure will present a behaviour related to its intrinsic performance and configuration, which will make it possible either to increase N by decreasing D, or to increase D by decreasing N. In other words, N×D remains nearly constant.
However, the reality and progression of existing computer networks is that N and T are both increasing proportionally at the same time. That is, the size and the speed of the network traffic are no longer inversely proportional, but both N and T are increasing. Added to this is the vastness and complexity of the traffic flow left to monitor and analyze.
Therefore, it would be desirable to develop a new method and system to perform efficient, practical, and improved traffic flow analyses for computer networks to evaluate high-speed and heavy traffic flow, as well as perform improved protocol analysis for emerging technologies such as, for example, VoIP (Voice over IP) applications.