Pattern matching is a method generally used in a communication system for managing and detecting transmission contents. Both gateway apparatus and bypass apparatus are able to implement upper-layer applications, such as network security, anti-virus (AV), bandwidth management, application recognizing, security detection and wide area network acceleration, through pattern matching. The method of pattern matching is a basis for constructing a content detection engine with high performance for network apparatus. The implementation of the pattern matching is a technical foundation for constructing a manageable and operable secure intelligent network.
Currently, there are mainly two methods of pattern matching, single-pattern matching and multi-pattern matching. The single-pattern matching refers that only one pattern string can be matched in a text string at one time, such as Boyer Moore (BM) algorithm, Brute force (BF) algorithm and Perl Compatible Regular Expression (PCRE) algorithm, etc. Especially, the MB algorithm, with relative higher precision, is able to give attention to both character matching and policy matching at the same time. The multi-pattern matching refers to performing matching for multiple pattern strings at the same time, e.g. Aho-Corasick (AC) algorithm.
Currently, both the single-pattern matching and the multi-pattern matching must be performed based on text continuity of a transmission flow. In practical applications, each text is divided into a plurality of segments for transmission in the network, which results in the difficulty in the pattern matching in the network. Currently, a solution in common use is to perform flow-reassembly for the segments, e.g. perform the flow-reassembly by taking a protocol format of the transmission flow such as a User Datagram Protocol (UDP), Internet Control Message Protocol (ICMP), or Transfer Control Protocol (TCP) as a pseudo-Flow, to save the segments and recover the segments, then perform the pattern matching for the reassembled continuous text.
This flow-reassembly method can provide continuous text required by the pattern matching, but also causes many problems.
First, the flow-reassembly makes modification of a protocol stack of the network device, such as TCP/IP protocol stack, become necessary, which destroys the integrity of the protocol stack and increases the fault possibility of the network device.
Second, the flow-reassembly necessitates caching segments of each text until the pattern matching is finished, which occupies a great deal of system memory, decreases system performance and also increases the possibility of DoS/DDos. On the other hand, because the system preserves memory for each text and only limited memory can be preserved, false negative is inevitable. In addition, caching the segments leads to a longer delay. For delay-sensitive services, such as Voice over IP (VoIP) services and video services, the delay will decrease the quality of services.
To control the occupation of the memory, a solution is currently proposed in which application protocols are divided into a row-mode and a length-mode before the flow-reassembly. As to the application protocols organizing packets in “rows”, a row of packets are cached at most. As to the application protocols which do not organize packets in rows, the packets are cached according to a defined packet length so as to control the occupation of the memory. Although this solution can improve the control on the memory occupation, improve the system performance to some extent and decrease the possibility that the system being attacked, it cannot overcome the defect of high memory occupation radically. It cannot avoid the delay caused by the modification of the protocol stack and the caching of segments, false negative and false positive either. Especially to the algorithms such as the BM algorithm, which also consider policy matching, the occupation of memory will be greater than that of ordinary AC algorithm.
Currently, there is no method of pattern matching which is not relied upon the flow-reassembly. In other words, there is currently no such a method of pattern matching which can implement pattern matching efficiently, accurately and intelligently based on a segment rather than flow-reassembly without modifying the protocol stack of the network device, caching large amount of segments and avoiding the problems caused by memory occupation and cache delay.