A regex describes a mode of matching strings, and is designed to match texts by searching a set of strings for the part that matches the given regex. The regex is widely applicable. In the communication industry, the regex is applied to mode matching for the data traffic, for example, protocol resolution, virus detection and service categorization.
In the prior art, before regex matching, the regex needs to converted into a Deterministic Finite Automata (DFA) first, and then the logical chip executes the DFA according to the compiled DFA and strings in the input data stream. In practical application, more than one check rule exists, and even tens of thousands of check rules exist. It is impossible to use DFA to check the to-be-matched traffic for tens of thousands of times. To avoid omission of the check rules, the tens of thousands of rules are compiled into a large DFA (which is generally of several hundreds of megabytes or even of 1 G). In the matching process, the to-be-matched traffic is used as input, and the output of the DFA report is used as matching rule.
After analyzing the prior art, the inventor finds at least the following defects in the prior art:
A large DFA is several hundreds of megabytes in size, the on-chip memory of such a capacity is too big to be integrated into an ordinary logical chip, and can be stored in an external Static Random Access Memory (SRAM) or Synchronous Dynamic Random Access Memory (SDRAM) only. In the matching process, once a state is matched, the corresponding DFA fragment of this state is loaded into the cache in the logical chip; the data table entries correlated with the current state keep being loaded in the matching process, and the state-related data table entries are often loaded repeatedly due to state transition; the more complex the DFA is, the more data table entries are to be loaded. Such a matching method consumes too much time and too many storage resources, and brings low performance of matching.