With the continued proliferation of networked and distributed computers systems, and applications that run on those systems, comes an ever increasing flow and variety of message traffic between and among computer devices. As an example, the Internet and world wide web (the “Web”) provide a global open access means for exchanging message traffic. Networked and/or distributed systems are comprised of a wide variety of communication links, network and application servers, sub-networks, and internetworking elements, such as repeaters, switches, bridges, routers, gateways.
Communications between and among devices occurs in accordance with defined communication protocols understood by the communicating devices. Such protocols may be proprietary or non-proprietary. Examples of non-proprietary protocols include X.25 for packet switched data networks (PSDNs), TCP/IP for the Internet, a manufacturing automation protocol (MAP), and a technical & office protocol (TOP). Other proprietary protocols may be defined as well. For the most part, messages are comprised of packets, containing a certain number of bytes of information. The most common example is Internet Protocol (IP) packets, used among various Web and Internet enabled devices.
A primary function of many network servers and other network devices (or nodes), such as switches, gateways, routers, load balancers and so on, is to direct or process messages as a function of content within the messages' packets. In a simple, rigid form, a receiving node (e.g., a switch) knows exactly where in the message (or its packets) to find a predetermined type of contents (e.g., IP address), as a function of the protocol used. Typically, hardware such as switches and routers are only able to perform their functions based on fixed position headers, such as TCP or IP headers. Further, no deep packet examination is done. Software, not capable of operating at wire speed is sometimes used for packet payload examination. This software does not typically allow great flexibility in specification of pattern matching and operates at speeds orders of magnitude slower than wire rate. It is highly desirable to allow examination and recognition of patterns both in packet header and payload described by regular expressions. For example, such packet content may include address information or file type information, either of which may be useful in determining how to direct or process the message and/or its contents. The content may be described by a “regular expression”, i.e., a sequence of characters that often conform to certain expression paradigms. As used herein, the term “regular expression” is to be interpreted broadly, as is known in the art, and is not limited to any particular language or operating system. Regular expressions may be better understood with reference to Mastering Regular Expressions, J. E. F. Friedl, O'Reilly, Cambridge, 1997.
It is clear that the ability to match regular expressions would be useful for content based routing. For this, a deterministic finite state automaton (DFA) or non-deterministic finite state automaton (NFA) would be used. The approach used here follows a DFA approach. A conventional DFA requires creation of a state machine prior to its use on a data (or character) stream. Generally, the DFA processes an input character stream sequentially and makes a state transition based on the current character and current state. This is a brute-force, single byte at a time, conventional approach. By definition, a DFA transition to a next state is unique, based on current state and input character. For example, in prior art FIG. 1A, a DFA state machine 100 is shown that implements a regular expression “binky.*\.jpg”. DFA state machine 100 includes states 0 through 9, wherein the occurrence of the characters 110 of the regular expression effect the iterative transition from state to state through DFA state machine 100. The start state of the DFA state machine is denoted by the double line circle having the state number “0”. An ‘accepting’ state indicating a successful match is denoted by the double line circle having the state number “9”. As an example, to transition from state 0 to state 1, the character “b” must be found in the character stream. Given “b”, to transition from state 1 to state 2, the next character must be “i”.
Not shown explicitly in FIG. 1A are transitions when the input character does not match the character needed to transition to the next state. For example, if the DFA gets to state 1 and the next character is an “x”, then failure has occurred and transition to a failure terminal state occurs. FIG. 1B shows part 150 of FIG. 1A drawn with failure state transitions, wherein a failure state indicated by the “Fail” state. In FIG. 1B, the tilde indicates “not”. For example, the symbol “˜b” means the current character is “not b”. Once in the failure state, all characters cause a transition which returns to the failure state, in this case.
Once in the accepting state, i.e., the character stream is “binky.*\.jpg”, the receiver node takes the next predetermined action. In this example, where the character stream indicates a certain file type (e.g., “.jpg”), the next predetermined action may be to send the corresponding file to a certain server, processor or system.
While such DFAs are useful, they are limited with respect to speed. The speed of a conventional DFA is limited by the cycle time of memory used in its implementation. For example, a device capable of processing the data stream from an OC-192 source must handle 10 billion bits/second (i.e., 10 gigabits per second (Gbps)). This speed implies a byte must be processed every 0.8 nanosecond (nS), which exceeds the limit of state of the art memory. For comparison, high speed SDRAM chips implementing a conventional DFA operate with a 7.5 nS cycle time, which is ten times slower than required for OC-192. In addition, more than a single memory reference is typically needed, making these estimate optimistic. As a result, messages or packets must be queued for processing, causing unavoidable delays.