1. Field of the Invention
This invention relates to logic circuit technology, and more particularly, to a regular expression pattern matching circuit based on a pipeline architecture which is designed for integration to a data processing system, such as a computer platform, a firewall, or a network intrusion detection system (NIDS), for checking whether an input code sequence (such as a network data packet) is matched to specific patterns predefined by regular expressions.
2. Description of Related Art
In the application of computer network systems, how to prevent the intrusion of hackers or malicious virus programs is an important research effort in the information industry. Presently, firewalls and NIDS (network intrusion detention system) are the most widely utilized technologies for this purpose. In operation, all incoming and outgoing network data packet are scanned to check whether their patterns are matched to the patterns of known packets from hackers or malicious virus programs. If a match is found, then the network data packet is blocked or discarded from entering into the network system.
Present network systems typically utilize regular expressions for description of the packet data patterns of known hackers or malicious virus programs. Presently, one practical implementation for regular expression pattern matching is to use a logic circuit composed of a comparator circuit module and a non-deterministic finite-state automata (NDFA) circuit module, which is described in more details in the following with reference to FIG. 1 through FIGS. 3A-3B.
FIG. 1 shows the circuit architecture of a conventional regular expression pattern matching circuit 10 (hereinafter referred to as “prior art”). As shown, this prior art comprises: (A) a comparator circuit module 100; and (B) a non-deterministic finite-state automata (NDFA) circuit module 200.
The conventional regular expression pattern matching circuit 10 has an input interface and an output interface, wherein the input interface includes a data input port DATA_IN, an enable signal input port ENABLE, and a clock signal input port CLK; while the output interface includes an array of N output ports [OUT(1), OUT(2), . . . , OUT(N)]. In this example, the data input port DATA_IN is an 8-bit bus for sequentially transferring a series of 8-bit characters of an input code sequence; the enable signal input port ENABLE is used for reception of an enable signal for enabling the operation of the conventional regular expression pattern matching circuit 10; and the clock signal input port CLK is used for reception of a clock signal. The N output ports [OUT(1), OUT(2), . . . , OUT(N)] are each a 1-bit data line whose output signal is used for indicating which regular expression is matched to the input code sequence, i.e., if the (k)th regular expression is a match, then the (k)th output port OUT(k) will output a logic-HIGH signal (1) while all the other output ports remain at logic-LOW state (0).
As shown in FIG. 2, in one application example of the conventional regular expression pattern matching circuit 10, the comparator circuit module 100 includes a static processing unit 101 and a dynamic processing unit 102; wherein the static processing unit 101 has an output interface including P output ports: CODE(1), CODE(2), . . . , CODE(P), whose output values are used for indicating the meaning, role, or function of each character in the input code sequence; while the dynamic processing unit 102 includes Q output ports: CLASS(1), CLASS(2), . . . , CLASS(Q), whose output values are used for indicating the class of each character in the input code sequence. In one practical application, for example, P=290 and the 290 output signals are used respectively for indicating 256 ASCII characters, a set of predefined character ranges, a set of special symbols, a set of special characters (such as blank, non-blank, single word, non-single word, integer, and non-integer), and 26 case-insensitive English alphabetic letters; while the Q output ports are used for indicating predefined classes such as [\x90-\xFF] and [^\s].
FIG. 3A shows the internal circuit architecture of the above-mentioned static processing unit 101, which is composed of 4 layers of logic circuits, including a first-layer logic circuit 110a, a second-layer logic circuit 120a, a third-layer logic circuit 130a, and a fourth-layer logic circuit 140a. The first-layer logic circuit 110a is an array of digital comparators, including equal comparators (=), unequal compactors (≠), larger-than compactors (>), and less-than compactors (<). The second-layer logic circuit 120a and the third-layer logic circuit 130a are a plurality of AND gates and OR gates which are specifically arranged to operate in combination for checking whether the value of a character is within a predefined range. The fourth-layer logic circuit 140a is an array of multiplexers (MUX).
Further, FIG. 3B shows the internal circuit architecture of the dynamic processing unit 102, which is also composed of 4 layers of logic circuits, including a first-layer logic circuit 110b, a second-layer logic circuit 120b, a third-layer logic circuit 130b, and a fourth-layer logic circuit 140b. The first-layer logic circuit 110b is an array of digital comparators, including equal comparators (=), unequal compactors (≠), larger-than compactors (>), and less-than compactors (<). The second-layer logic circuit 120b and the third-layer logic circuit 130b are a plurality of AND gates and OR gates. The fourth-layer logic circuit 140b is an array of multiplexers (MUX).
One drawback to the circuit architecture of the conventional regular expression pattern matching circuit 10, however, is that the multi-layer architecture (i.e., 4-layer architecture) of the comparator circuit module 100 causes a time delay such that after the NDFA circuit module 200 is enabled, the NDFA circuit module 200 has to wait until the comparator circuit module 100 completes its logic operation to start operation. This time delay undoubtedly cause a degrade in the overall processing speed.