Regular expressions provide a concise and formal way of describing a set of strings over an alphabet. Regular expression search and/or match operations are employed in various applications including, for example, intrusion detection systems (IDS), anti-virus products, policy-based routing functions, internet and text search operations, document comparisons, and so on. A regular expression can simply be a word, a phrase or a string of characters. For example, a regular expression including the string “gauss” would match data containing gauss, gaussian, degauss, etc. More complex regular expressions include metacharacters that provide certain rules for performing the match. Some common metacharacters are the wildcard “.”, the alternation symbol “|’, and the character class symbol “[ ].” Regular expressions can also include quantifiers such as “*” to match 0 or more times, “+” to match 1 or more times, “?” to match 0 or 1 times, {n} to match exactly n times, {n,} to match at least n times, and {n,m} to match at least n times but no more than m times. For example, the regular expression “a.{2}b” will match any input string that includes the character “a” followed exactly 2 instances of any character followed by the character “b” including, for example, the input strings “abbb,” adgb,” “a7yb,” “aaab,” and so on.
Given a regular expression and a string, the regular expression matches the string if the string belongs to the set described by the regular expression. Regular expression matching may be used, for example, by command shells, programming languages, text editors, and search engines to search for text within a document. Known techniques for regular expression matching can have long worst-case matching times. In addition, regular expression matching technique typically consume a significant amount of memory space to store the states and transitions associated with one or more automaton representative of the regular expression.