Deep packet inspection—DPI—enables analyzing the application-layer content of a packet to detect whether it contains patterns taken from a signature database, such as content strings, regular expressions, or snort-type modifiers. One issue with DPI is shown in FIG. 1 and relates to the case when patterns span over multiple packets within the same flow. Conventionally this problem is handled by (1) reconstructing the flow by reassembling consecutive packets P1-P6, so that they are in-order; and (2) applying DPI on the reconstructed stream and look for matches in the entire flow, also known as performing deep flow inspection (DFI). This enables to decouple the layer in charge of reassembling the flow with the layer in charge of running the specific DPI technique implemented. Indeed, any DPI technique can be exploited as long as it can operate on the reconstructed flow as provided by the reassembling layer. Conventional methods based on the Automate Theory show that pattern matching can be done efficiently, since the input is represented by a string of finite bytes.
However, these conventional methods have two major disadvantages which in practice make its application on large traffic volumes infeasible. First, the method requires dedicating a flow reconstruction chain, e.g., a thread in software implementations, for every flow crossing the link, thus draining computational resources. Second, the reassembling layer must explicitly maintain a state per each flow being reconstructed, thus also draining memory resources. The packets should be stored in order to complete the reconstruction phase even when the packets arrive in order. However the non-patent literature of Dharmapurikar, S., & Paxson, V, (2005, August), Robust TCP Stream Reassembly in the Presence of Adversaries, in USENIX Security shows that the number of flows with out-of-order packets is rather small—from 2 to 13%—and most of them—95-96%—have holes produced for a single out-of-order packet.