Identifying the flows generated by different application-layer protocols is of major interest for network operators. Such identification enables QoS (quality of service) engineering for different types of traffic, such as voice and video traffic, and enables specific applications such as traffic forensics applications, network security applications, etc. Moreover, it enables ISPs to control applications with a potential to use large amounts of resources, such as peer-to-peer (P2P) applications, to limit and/or control application traffic and usage. For enterprise networks, it is very important for administrators to know activities on their network, such as services that users are running, the applications dominating network traffic, etc.
Throughout this disclosure, the term “flow” refers to a sequence of packets exchanged between two network nodes, referred to as a source and a destination of the flow where the source or the destination may be the originator of the exchange. Generally, in an IP network, such as the Internet, a flow is identified by a 5-tuple of <source IP address, destination IP address, source port, destination port, protocol> where the payload of the flow may be represented by a string of alphanumeric characters and other sequences of bits.