Deep content inspection of network packets is driven, in large part, by the need for high performance quality-of-service (QoS) and signature-based security systems. Typically QoS systems are configured to implement intelligent management and deliver content-based services which, in turn, involve high-speed inspection of packet payloads. Likewise, signature-based security services, such as intrusion detection, virus scanning, content identification, network surveillance, spam filtering, etc., involve high-speed pattern matching on network data.
The signature databases used by these services are updated on a regular basis, such as when new viruses are found, or when operating system vulnerabilities are detected. This means that the device performing the pattern matching must be programmable.
As network speeds increase, QoS and signature-based security services are finding it increasingly more challenging to keep up with the demands of matching packet contents. The services therefore sacrifice content delivery or network security by being required to miss packets. Currently, fast programmable pattern matching machines are implemented using finite state machines (FSM). As is known, the process of mapping a regular expression, or signature database, to a FSM involves compiling the expression into a non-deterministic finite-state automaton (NFA), and then converting the NFA to a deterministic finite-state automaton (DFA).
An FSM typically starts in a given initial state, usually state zero. On receipt of each input symbol, the FSM advances to a new state determined by the current state, together with the input symbol. This operation is referred to as calculating the “next state” or “transition function” of the finite state machine. The calculation of the next state is often performed through a table lookup. The table, known as the “transition table”, is arranged so as to have the row number determined by the current state and the column number by the current input symbol. Each entry in the transition table contains the value for the next state given that current state, as defined by the row, and the input symbol, as defined by the column. The transition table is commonly stored using a RAM lookup table. Data symbols received from a digital network are usually encoded as 8-bit bytes, and the number of states is determined by the complexity of the given application. The following pseudo-code illustrates the FSM operation:
CURRENT_STATE = 0for each INPUT_SYMBOL,NEXT_STATE =TRANSITION_TABLE[CURRENT_STATE][INPUT_SYMBOL]CURRENT_STATE = NEXT_STATEnext INPUT_SYMBOL
FIG. 1 shows a block diagram of a conventional finite state machine 10. The current state is encoded as an m-bit binary word, and the current input symbol as a k-bit binary word. These bits are concatenated together by logic block 12 to form an (m+k)-bit address to a RAM lookup table 14. RAM 14 contains the state transition table, that is, each RAM entry contains an m-bit word representing the next state given the current state and the input symbol. Look-up table 16 receives data from RAM look-up table 14 to define the action to take in each particular state. This is used to indicate terminal/accept states, etc. These actions are shown as being encoded as p-bit words.
Programmable FSMs are often expensive because of the size of the memory required to store the transition table. This problem is even more pronounced for fast FSMs which are required to compute the next state within a few and fixed number of clock cycles. For example, the state machine implementation shown in FIG. 1, having m-bit state vector and k-bit symbols, requires 2m+k entries of m-bit words for storing the full transition table. Additional memory is required for the output look-up table. For example, for an application servicing 1 Gbps network traffic, the FSM is required to compute the next state every 8 ns, for 8-bit input symbols. This poses a challenging task.
U.S. Pat. No. 6,167,047 describes a technique in which memory optimization is achieved through usage of stack memory allowing the state machine to repeat common sub-expressions while calculating the next state within a single clock cycle. This technique uses a large memory, and therefore limits the complexity of the FSM. This technique also suffers from the problem that the stack memory is limited.