A packet is a formatted unit of data carried by a packet switched network. When data has to be transmitted, it is broken down into segments and formatted into packets for transmission over the network. When the packets reach their destination the segments of data encapsulated within the packets can be retrieved and reassembled into the original data. The exact format of the packets depends upon the protocol used by the network. However, a packet typically has a header and a payload. The header usually contains information required to route the packet to its intended destination, and possibly information identifying the source of the packet. The payload or content of the packet contains the data segment carried by the packet. A packet flow or traffic flow is a sequence of packets sent from a particular source to a particular destination.
Packet filtering involves parsing a packet header, and applying a pre-defined set of rules to the information contained within the header in an attempt to classify the type of network traffic to which the packet belongs. Traditionally, packet filtering has been used by firewalls to prevent unauthorized access to or from a particular network or computer, whilst permitting authorized communications. However, packet filters are stateless, as they examine packets on an individual basis and have no memory of previous packets, which makes them vulnerable to spoofing attacks. Spoofing involves an attacker gaining unauthorized access to a computer or network by making it appear that a malicious message has come from a trusted machine by faking the address of that machine in the packet header.
As an advance on packet filtering, stateful packet inspection can be used to determine which network packets to allow through the firewall. Stateful packet inspection involves examining packet headers and remembering something about them. This information can then be used when processing later packets. For example, both incoming and outgoing packets can be examined over a period of time and outgoing packets that request specific types of incoming packets are tracked, with only those incoming packets constituting a proper response to an outgoing packet being allowed through the firewall.
Packet filtering and stateful packet inspection are known as Shallow Packet Inspection (SPI) techniques, as they rely solely on the information contained in the header of the packets to determine how a packet should be dealt with. For example, in an IP network, SPI involves inspecting IP packets up to layer 4 (TCP/UDP layer) of the OSI model, typically extracting a “5-tuple” consisting of the source IP address, destination IP address, source transport layer address (e.g. TCP/UDP port), destination transport layer address (e.g. TCP/UDP port), and the next level protocol used in the data portion of packet (e.g. TCP, UDP, ICMP etc). However, by only examining the information contained with the packet headers, these SPI techniques have their limitations.
In order to overcome some of the shortcomings of SPI, Deep Packet Inspection (DPI) involves looking beyond the header information, and inspecting the content or payload of packets, up to layer 7 of the OSI model. This thorough analysis of packets can be used for a variety of purposes, including, among others, network security, network management, traffic profiling and statistics collection, copyright enforcement, content regulation, and surveillance.
A DPI system analyzes the header and payload of packets flowing through it, and applies a set of packet classification rules or criterion to the information in the header and payload of packets in an attempt to identify the class of traffic and user session to which a traffic flow belongs. For example, a DPI system will parse the packets in a flow to determine the type of protocols (HTTP, SMTP, etc) that the packets relate to, the metrics of the packets (size, ports, etc), the packet or octet transfer rates, and the sequence(s) of exchanged packets etc. A DPI system will then apply the packet classification rules to all of this information in an attempt to determine the class of traffic. These packet classification rules can make use a variety of techniques such as port analysis, string matching, statistical analysis, heuristic analysis, protocol header analysis, packet payload analysis etc. As such, systems that implement DPI are required to thoroughly analyse packets in real-time and, in general, will be required to analyse at least a minimum number of packets at the beginning of almost every traffic flow sent and/or received by a user. DPI systems are therefore required to be capable of providing a significant amount of computing power.
Due to the highly demanding task performed by DPI systems when used to perform packet classification, and to the ever increasing amount of network traffic, the amount of data to be analyzed by DPI systems has reached a point at which optimisation mechanisms are mandatory. Currently, optimisation for DPI systems is achieved using either horizontal or vertical scaling. Horizontal scaling makes use of an increasing number of machines to perform the DPI analysis in parallel, whereas vertical scaling involves delegating individual steps of the DPI analysis to specialised hardware. However, both increasing the number of machines and the use of specialized hardware can be expensive such that it may well be unfeasible, from a business perspective, to acquire and maintain the systems required to perform DPI. For example, DPI is likely to be unfeasible for the traffic flows generated by or for the users of a flat-fee mobile broadband service.
In addition, both horizontal scaling and vertical scaling are only capable of achieving a linear increase in performance. For example, in order to double the rate at which a DPI system can classify traffic flows, the system would require double the number of machines, or double the amount of resources. Given that all forecasts of Internet traffic, both mobile and fixed, predict an exponential increase in the amount of traffic consumed by users, this linear increase in DPI throughput will not be sufficient, at a reasonable cost, to keep up with the traffic that will have to be analyzed. It is therefore desirable to provide a mechanism for optimising the traffic classification performance of DPI systems to sufficiently increase their throughput at a minimal cost.