The present invention relates to the design and implementation of state machine engines in data processing systems.
A finite state machine (FSM) is a model of behaviour composed of states, transitions and actions. A state stores information about the past, i.e., it reflects the input changes from the start to the present moment. A transition indicates a state change and is described by a condition that would need to be fulfilled to enable the transition. An action is a description of an activity that is to be performed at a given moment. A specific input action is executed when certain input conditions are fulfilled at a given present state. For example, an FSM can provide a specific output (e.g., a string of binary characters) as an input action.
An FSM can be represented using a set of (state) transition rules that describes a state transition function. State transition diagrams are used to graphically represent FSMs. Classic forms of state transition diagrams are directed graphs, where each edge is a transition between two states and each vertex is a state. The inputs are signified on each edge.
Controllers in a broad spectrum of devices and systems are often based on state machine engines that implement a FSM. Emerging trends, including programmable accelerators etc., require the operation of these devices, and consequently also the controller operation, to be configurable and/or programmable. For this purpose, programmable state machine engines are used.
An example of such a programmable accelerator is the ZuXA accelerator concept described in a paper co-authored by one the inventors: Jan van Lunteren et al, “XML Accelerator Engine”, Proc. of First International Workshop on High Performance XML Processing, 2004. ZuXA is based on the BaRT-based FSM (B-FSM) technology. BaRT (Balanced Routing-Table Search) is a specific hash table lookup algorithm described in a paper of one of the inventors: Jan van Lunteren, “Searching Very Large Routing Tables in Wide Embedded Memory”, Proc. of GLOBECOM '01, pp. 1615-1619.
A ZuXA controller can be used to improve the processing of XML (eXtensible Markup Language) code. It is fully programmable and provides high performance in combination with low storage requirements and fast incremental updates. Especially, it offers a processing model optimized for conditional execution in combination with dedicated instructions for character and string-processing functions. The B-FSM technology describes a state transition function using a small number of state transition rules, which involve match and wildcard operators for the current state and input symbol values, and a next-state value. The transition rules are assigned priorities to resolve situations in which multiple transition rules are matching simultaneously.
FIG. 1 shows a block diagram of a subsystem of a controller comprising a state machine engine that implements a B-FSM (an FSM based on the BaRT hash table lookup operation). The transition rules are stored in a transition rule memory 10. A rule selector 11 reads rules from the rule memory 10 based on a given input vector and a current state stored in a state register 12. The transition rules stored in the rule memory 10 are encoded in the transition rule vector format shown in FIG. 2. A transition rule vector comprises a test part 20 and a result part 21. The test part 20 comprises fields for a current state 22, an input character 23 and a condition 24. The result part 21 comprises fields for a mask 25, a next state 26, an output 27, and a table address 28.
In a ZuXA controller the input to the rule selector 11 consists of a result vector provided by a component called instruction handler, in combination with a general-purpose input value obtained, for example, from an input port. In each cycle, the rule selector 11 will select the highest-priority transition rule that matches the current state stored in the state register 12 and the input vector. The result part 21 of the transition rule vector selected from the transition rule memory 10 will then be used to update the state register 12 and to generate an output value. The output value includes instructions that are dispatched for execution by the instruction handler component. The execution results are provided back to the rule selector 11 and used to select subsequent instructions to be executed by the instruction handler component as described above.
FIG. 3 shows a more detailed block diagram of the state machine engine of FIG. 1. The transition rule memory 10 contains a transition rule table 13 that is implemented as a hash table. Each hash table entry of the transition rule table 13 comprises several transition rules that are mapped to the hash index of this hash table entry. The transition rules are ordered by decreasing priority within a hash table entry. An address generator 14 extracts a hash index from bit positions within the state stored in the state register 12 and input vectors that are selected by a mask stored in a mask register 15. In order to obtain an address for the memory location containing the selected hash table entry in the transition rule memory 10, this index value will be added to the start address of the transition rule table in this memory. This start address is stored in a table address register 16.
The function of the rule selector 11 is based on the BaRT algorithm, which is a scheme for exact-, prefix- and ternary-match searches. The BaRT search operation involves comparing the N=4 transition rule entries 30, 31, 32, 33 contained in each hash table entry 0 and 1 in parallel with the search key. The search key is build from the actual values of the state register 12 and the input vector, while taking potential “don't care” conditions indicated by the condition field 24 of the transition rule entries into account. The first matching transition rule vector is then selected and its result part field 21 is selected to become the search result.
Especially, in a ZuXA controller the search result can be used to generate an instruction vector for the instruction handler component that provides processing results back to the state machine engine as part of an input vector. The instructions contained in the instruction vector can be used for simple (and fast to be implemented) functions that run under tight control of the state machine engine. Examples are character—and string processing functions, encoding, conversion, searching, filtering, and general output generating functions.
Compared to other applications in which state machine engines are used, controllers embedded in larger systems often involve a much wider input vector to the state machine engine that is comprised of “status” and result information of a multitude of logic functions and components that are controlled by the state machine engine. For example, such embedded controllers are used in computer systems to perform parsing and pattern matching operations on a given stream of network data in order to offload these tasks from the central processors. The U.S. Pat. No. 7,480,312 describes such a network traffic accelerator system and method.
For usual pattern-matching and parsing applications on the other hand, the input to the state machine engine often consists only of a single character in each clock cycle, a single byte in case of standard encodings such as ASCII (American Standard Code for Information Interchange). Support of wider input vectors as needed for a network traffic accelerator system, for example 32 bits, is much harder to implement in an efficient way at high processing rates, than to implement a state machine engine which processes input vectors consisting of only 8 bits, mainly because of the much larger set of possible input values that can occur. Due to the high clock frequencies of today's processors it is therefore a challenging task to provide a ZuXA controller implementation for the use as a network traffic accelerator in computer systems with such high speed processors.
In practice, however, often only a subset of the entire set of possible input values will be used, and consequently, the state machine engine design can be optimized for that given subset. One example is to use a hash function for selecting state transitions, which only considers certain groups of bits from the input value. Another example would be to assume that from most states (e.g., 95%), at most a certain number (e.g., 4) of transitions can be made, each labeled with a certain input value.
A similar approach related to logic synthesis methods is described in a technical disclosure published as IPCOM121980D. Logic synthesis is a process by which an abstract form of desired hardware logic circuit behaviour (typically at the so-called register transfer level or behavioural level) is turned into a circuit design implementation in terms of logic gates. Common examples of this process include synthesis of hardware description languages (e.g., VHDL or Verilog). In a logic synthesis tools chain, an FSM compiler is used to process a state transition table (or other specified input formats) and derives a sum-of-products equation for each output and for each bit of the storages (e.g., latches) used to represent the state of an FSM.
Since it is not possible for a simple FSM compiler alone to determine if a particular FSM contains sub-paths of a timing critical path in the circuit design implementation, but this information is usually known to the logic designer, the logic designer can provide this information to the FSM compiler. The FSM compiler can then use this information to reorder the sum-of-products equations to reduce the delay of the critical sub-path based on the designer's “coaching”.
There exist several others of these examples corresponding to a variety of different techniques that can be used to implement a state machine engine. In all cases the subset of the possible input values is specified by certain constraints for the set of possible input values. It is therefore beneficial to optimize the state machine engine implementation for that given subset of possible input values, enabling an efficient and fast implementation, rather than trying to cover all possible input values, resulting in an expensive and slow implementation. However, the problem that arises in this case is that in very few cases, some of the input values, or combinations of input values, can occur that are not supported by the implementation.