The communications industry is rapidly changing to adjust to emerging technologies and ever increasing customer demand. This customer demand for new applications and increased performance of existing applications is driving communications network and system providers to employ networks and systems having greater speed and capacity (e.g., greater bandwidth). In trying to achieve these goals, a common approach taken by many communications providers is to use packet switching technology. Increasingly, public and private communications networks are being built and expanded using various packet technologies, such as Internet Protocol (IP).
Regular expression matching is becoming a common operation to be performed at high speeds. For example, URLs may need to be located in Layer 7 (L7) packet headers only if they match a set of regular expressions to classify the sessions appropriately. Similarly, regular expression matching is used for intrusion detection, security screening (e.g., whether an email or other message contains certain patterns of keywords), load balancing of traffic across multiple servers, and array of many other applications.
A problem, especially for high speed applications, is the rate at which matching can be performed, as well as the space required to store the match identification data structure. A common method to match common expressions is to convert them to a deterministic finite automaton (DFA). The use of DFAs for regular expression matching which produces a set of matched regular expressions upon reaching a final state is well-known. From one perspective, a DFA is a state machine which processes characters of an input string, and upon reaching a final state, generates a list of one or matched regular expressions. If multiple regular expressions are to be simultaneously matched against, then the DFA for each of the different regular expressions is traversed, or the DFAs are multiplied together to form a single combined DFA which is traversed to identify the matching regular expression or expressions. However, when a regular expression contains a closure, the number of states required for a DFA and for combined DFA can explode (i.e., greatly increase), thus consuming a lot of resources. Also, the memory requirements and speed at which these DFAs may be traversed may not meet the needs of certain applications, especially some high-speed applications.