The present invention relates to systems for monitoring network traffic on a computer network such as the Internet and, in particular, to a system providing improved classification of network packets.
Networks, such as the Internet, communicate message data by means of discrete packets each having a “payload” (typically a portion of the message data to be communicated) coupled to packet control data such as a destination address. The packets are transmitted individually onto the network to be routed by multiple intermediate autonomous devices, such as routers, according to the attached destination address. The packets are received at the destination address to be reassembled into the message data. Packet communication protocols allow flexible and efficient high-speed data transmission over networks with complex and dynamic topologies.
The payload of a given packet may carry message data from a variety of “message categories” describing the type of message data and its use. For example, packets may carry message data of different types including text, images, audio, and other data. The message data may be used in real-time communication, for example, voice over Internet (VOIP) telephone conversations, streaming video, or the like, or may be a relatively time insensitive data file transfer.
It would be desirable, for reasons of network management, security, and research, to be able to identify and sort the packets according to the message categories. For example, it might be desirable to limit the proportion of the bandwidth of a network dedicated to time insensitive transfers of large computer files, for example, through peer-to-peer (P2P) applications, in favor of time critical voice telephone communications. Identification of the message category of a packet could also be important for security purposes to block malicious traffic. A deeper insight into the message categories of packets being transmitted could also aid in the study of networks and thus provide a useful research tool.
Inspection of the packets themselves provides very little information about the message category. For example, payload data for an individual packet transferring image data can be identical or indistinguishable from payload data for an individual packet transmitting a portion of a telephone conversation. The possibility of encryption and compression of payload data makes any attempt to discern the message category from payload data even harder.
Packets transmitted under Internet Protocols, such as the Transmission Control Protocol (TCP) or User Datagram Protocol (UDP), may include in the packet control data a port number associated with different types of services. For example, specific ports may be assigned to protocols developed for different message categories including the protocols of: File Transfer Protocol (FTP), Simple Mail Transfer Protocol (SMTP), and Hyper Text Transfer Protocol (HTTP). These protocols provide only a very coarse view of the message category. This is particularly true because so many developing applications for a wide range of message categories simply use HTTP, defeating simple classification using port numbers. Port numbers have even less value with respect to malicious traffic where there is strong incentive to actively obfuscate any indication of the classification of the payload by port number.
For this reason, fine-grained classification of packets by message category currently relies on one of two techniques. The first generally tries to match the data of packet payloads with one of a library of signatures composed of unique byte sequences associated with particular applications. This technique is widely used in attempting to identify malicious traffic. The problem of this approach is that byte sequences are often not unique to a particular message category and, of course, for malicious traffic techniques, are used to actively thwart signature matching.
The second technique focuses on building statistical models of “transport layer metrics” such as connection duration and packet size. Statistical techniques such as cluster analysis and machine learning can then divide packets into message categories based on similarity of the transport layer characteristics. Again statistical fingerprints may fail to distinguish many important packet classifications.
Ideally, any classification system for packets must operate at extremely high rates so as to provide comprehensive and timely analysis of network traffic.