Regular expression search operations are employed in various applications including, for example, intrusion detection systems (IDS), virus protections, policy-based routing functions, internet and text search operations, document comparisons, and so on. A regular expression can simply be a word, a phrase or a string of characters. For example, a regular expression including the string “gauss” would match data containing gauss, gaussian, degauss, etc. More complex regular expressions include metacharacters that provide certain rules for performing the match. Some common metacharacters are the wildcard “.”, the alternation symbol “|’, and the character class symbol “[ ].” Regular expressions can also include quantifiers such as “*” to match 0 or more times, “+” to match 1 or more times, “?” to match 0 or 1 times, {n} to match exactly n times, {n,} to match at least n times, and {n,m} to match at least n times but no more than m times. For example, the regular expression “a.{2}b” will match any input string that includes the character “a” followed exactly 2 instances of any character followed by the character “b” including, for example, the input strings “abbb,” adgb,” “a7yb,” “aaab,” and so on.
Traditionally, regular expression searches have been performed using software programs executed by one or more processors, for example, associated with a network search engine. However, as both the number and complexity of regular expressions increase for applications such as intrusion detection systems, software solutions are less able to achieve desired search speeds and throughput. As a result, hardware solutions such as ternary content addressable memory (TCAM) based search engines are being developed that can implement and perform regular expression search operations at faster speeds than software solutions typically allow.
To program a hardware-based search engine to implement regular expression search operations, a compiler is needed to translate the regular expression into bit groups that can be loaded into various programmable circuits of the search engine. Indeed, there is a need for compiling a human-readable architecture-independent regular expression into computer-readable architecture-dependent bit groups for controlling CAM-based regular expression search engines.