Exemplary embodiments relate generally to pattern matching in a data processing system, and more specifically to transition rule sharing based on short state tags.
A clear trend that can be observed in the Internet is the increasing amount of packet data that is being inspected before a packet is delivered to its destination. In the early days, packets were solely routed based on their destination address. Later, firewall and quality-of-service (QoS) applications emerged that examined multiple fields in the packet header, for example, the popular 5-tuple consisting of addresses, port numbers and protocol byte. More recently, network intrusion detection systems (NIDS), virus scanners, filters and other “content-aware” applications go one step further by also performing scans on the packet payload. Although the latter type of applications tend to reside closer to the end user, thus involving link speeds that are only a fraction of the speeds in the backbone, the ongoing performance improvements throughout the Internet make it very challenging to perform the required packet processing at full wirespeed.
Pattern matching functions may be utilized for intrusion detection and virus scanning applications. Many pattern matching algorithms are based on finite state machines (FSMs). A FSM is a model of behavior composed of states, transitions, and actions. A state stores information about the past, i.e., it reflects the input changes from the start to the present moment. A transition indicates a state change and is described by a condition that would need to be fulfilled to enable the transition. An action is a description of an activity that is to be performed at a given moment. A specific input action is executed when certain input conditions are fulfilled at a given present state. For example, a FSM can provide a specific output (e.g., a string of binary characters) as an input action.
A hash table is a data structure that can be used to associate keys with values: in a hash table lookup operation the corresponding value is searched for a given search key. For example, a person's phone number in a telephone book could be found via a hash table search, where the person's name serves as the search key and the person's phone number as the value. Caches, associative arrays, and sets are often implemented using hash tables. Hash tables are very common in data processing and implemented in many software applications and many data processing hardware implementations.
Hash tables are typically implemented using arrays, where a hash function determines the array index for a given key. The key and the value (or a pointer to their location in a computer memory) associated to the key is then stored in the array entry with this array index. This array index is called the hash index. In the case that different keys are associated to different values but those different keys have the same hash index, this collision is resolved by an additional search operation (e.g., using chaining) and/or by probing.
A balanced routing table search (BaRT) FSM (B-FSM) is a programmable state machine, suitable for implementation in hardware and software. A B-FSM is able to process wide input vectors and generate wide output vectors in combination with high performance and storage efficiency. B-FSM technology may be utilized for pattern-matching for intrusion detection and other related applications. The B-FSM employs a special hash function, referred to as “BaRT”, to select in each cycle one state transition out of multiple possible transitions in order to determine the next state and to generate an output vector. More details about the operation of a B-FSM is described in a paper authored by inventor Jan Van Lunteren, which is herein incorporated by reference, entitled “High-Performance Pattern-Matching for Intrusion Detection”, Proceedings of IEEE INFOCOM '06, Barcelona, Spain, April 2006.