In general, network administrators or ISPs (Internet Service Providers) do not have access to end user devices or applications communicating via their networks. Hence, for traffic characterization and classification, they generally rely passively monitoring network traffic, at some intermediate node, in order to classify network traffic for purposes such as better network resource provisioning and QoS (quality of service) prioritization.
Conventionally, such attempts for application-layer characterization have focused on using IP-layer packet statistics, or for unencrypted traffic, deep-packet inspection (DPI) techniques used information in unencrypted packets to read higher layer protocol headers or even packet payload to identify applications. However, with increasing use of encryption (TLS over HTTP/1.1 or HTTP/2), many of the DPI techniques can no longer be used. Newer approaches attempt to identify signatures in packet header or inter-packet statistics, together with additional meta-data, including IP address and DNS (domain name service) lookups. However, most of these approaches are expensive, both in terms of computational resource demands and updates (need newer rules for newer applications or any changes to old applications). Recent solutions have been proposed which make use of machine-learning tools, using supervised or semi-supervised approaches and flow-statistics (for example, connection duration, bytes downloaded, bytes uploaded, etc.) to classify connections. Finally, there are commercial products which combine many of the above techniques to provide solutions. For example, Sandvine provides specialized-hardware boxes which are capable of monitoring significant amounts of traffic and quickly label the connections with appropriate applications or application-classes. However, such products are expensive.