1. Field of the Invention
The present invention relates to computers and computer networks. More particularly, the invention relates to classifying network traffic in a computer network.
2. Background of the Related Art
Identifying the flows generated by different application-layer protocols is of major interest for network operators. For Internet service providers (ISPs), identifying traffic allows them to differentiate the QoS (quality of service) for different types of applications, such as voice applications and video applications. Moreover, it enables them to control high-bandwidth and non-interactive application, such as peer-to-peer (P2P) applications. For enterprise networks, it is very important for administrators to know activities on their network, such as services that users are running, the application dominating network traffic, etc. Traffic classification is also important for securing the network. In fact, even traditional protocols are often used as means to control attacks, such as the use of IRC (Internet Relay Chat) to mange the C&C (command and control) nodes for botnets. Overall, traffic classification is the first step in building any kind of intelligence on a network.
Popular current solutions include Deep Packet Inspection (DPI), which does not scale since it requires tedious manual reverse engineering of protocols, a daunting problem given the proliferation of applications and protocols. Similarly, approaches based on statistical classification still deeply rely on the availability of a training set to extract signatures which must be updated with regularity. All these classifiers share some key limitations. First, to achieve a high classification accuracy, either a manual and cumbersome reverse engineering of protocols to identify the signatures in DPI or a tedious process to generate an accurate training set for behavioral classifiers is required. Second, the classifiers can identify only the specific applications they have been trained for. All other traffic is aggregated either in a generic class labeled as “unknown”, or mislabeled as one of the known applications. In other words, these classifiers cannot identify the introduction of a new application, or changes in the applications' protocols or the users' behavior, unless a re-training phase is triggered.
Throughout this disclosure, the term “flow” refers to a sequence of packets from a source node to a destination node in the network. Generally, a flow is represented by a 5-tuple of <source IP address, destination IP address, source port, destination port, protocol>. In particular, the protocol in the 5-tuple refers to a layer 4 (i.e., transport layer) protocol, such as TCP, UDP, ICMP, etc. Further, the terms “application” and/or “application class” refer to a layer 7 (i.e., application-layer) protocol with a distinct documented behavior in terms of communication exchanges, control packets, etc. Examples of such application include HTTP, SMTP, MSN, BitTorent, Gnutella, POP3, MSN, EDonkey, Telnet, Samba, Yahoo im, etc. Moreover, the term “application” may be referred to as the label or the class of the flow depending on the context.