This disclosure relates generally to the field of verifying the equivalence of a regular expression to a transformation of the regular expression that takes advantage of a post-processor to optimize the storage efficiency, and more particularly to determining the equivalence of a finite state automaton representation of a regular expression to a transformed finite state automaton representation of the regular expression that is coupled with a pre-verified and pre-optimized post-processor.
Packet content scanning is an essential part of network security and monitoring applications. Intrusion detection systems such as Snort™ rely heavily on regular expressions to express increasingly complex attack patterns. A typical way of matching regular expressions in a stream of input characters is by simulating the input on a Finite State Automaton (FSA), which may be a nondeterministic FSA (NFA) or a deterministic FSA (DFA), compiled from the regular expression. For example, FIG. 1 shows an example of a DFA 100 that detects the regular expression “abc.*def*ghi” in an input data stream. The regular expression “abc.*def*ghi” is in perl compatible regular expression (PCRE) format. The DFA 100 is modeled as a directed graph. The DFA states are shown in circles, the state transitions are shown using directed edges, and the set of input characters resulting in the transitions (i.e., the transition rules) are given in the rectangular boxes. The initial state of the DFA is labeled as state 0, with intermediate states numbered 1 to 8, leading up to a match of the regular expression at state number 9. The plurality of transition rules governs transitions between the states. Note that if the regular expression is non-anchored, additional transitions that point to state 0 and state 1 would be needed in FIG. 1. Similarly, if the regular expression is anchored, there has to be an explicit invalid state in the DFA and additional transitions pointing to the invalid state for state/input combinations without a valid next state.
A FSA architecture may be programmed to recognize one or more regular expressions in an input data stream by loading a set of state transition rules into off-chip or on-chip memories. Performance of such architectures depends on the storage efficiency of the compiled set of state transition rules, as on-chip memory resources are usually limited, and off-chip memory accesses can be costly in terms of processing time.