Significant trends in computing and communications are leading to the emergence of environments that abound in content analytics and processing. These environments require high performance as well as programmability on a certain class of functions, namely searching, parsing, analysis, interpretation, and transformation of content in messages, documents, or packets. Notable fields that stress such rich content analytics and processing include content-aware networking, content-based security systems, surveillance, distributed computing, wireless communication, human interfaces to computers, information storage and retrieval systems, content search on the semantic web, bio-informatics, and others.
The field of content-aware networking requires searching and inspection of the content inside packets or messages in order to determine where to route or forward such packages and messages. Such inspection has to be performed on in-flight messages at “wire-speed”, which is the data-rate of the network connection. Given that wire rates in contemporary networks range from 100 Mbits/second all the way to 40 Gbits/second, there is tremendous pressure on the speed at which the content inspection function needs to be performed.
Content-based security systems and surveillance and monitoring systems are required to analyze the content of messages or packets and apply a set of rules to determine whether there is a security breach or the possibility of an intrusion. Typically, on modern network intrusion detection systems (NIDS), a large number of patterns, rules, and expressions have to be applied to the input payload at wire speed to ensure that all potential system vulnerabilities are uncovered. Given that the network and computing infrastructure is continuously evolving, fresh vulnerabilities continue to arise. Moreover, increasingly sophisticated attacks are employed by intruders in order to evade detection. Intrusion detection systems need to be able to detect all known attacks on the system, and also be intelligent enough to detect unusual and suspicious behavior that is indicative of new attacks. All these factors lead to a requirement for both programmability as well as extremely high performance on content analysis and processing.
With the advent of distributed and clustered computing, tasks are now distributed to multiple computers or servers that collaborate and communicate with one another to complete the composite job. This distribution leads to a rapid increase in computer communication, requiring high performance on such message processing. With the emergence of XML (Extensible Markup Language) as the new standard for universal data interchange, applications communicate with one another using XML as the “application layer data transport”. Messages and documents are now embedded in XML markup. All message processing first requires that the XML document be parsed and the relevant content extracted and interpreted, followed by any required transformation and filtering. Since these functions need to be performed at a high message rate, they become computationally very demanding.
With the growth of untethered communication and wireless networks, there is an increase in the access of information from the wireless device. Given the light form factor of the client device, it is important that data delivered to this device be filtered and the payload be kept small. Environments of the future will filter and transform XML content from the wireline infrastructure into lightweight content (using the Wireless Markup Language or WML) on the wireless infrastructure. With the increasing use of wireless networks, this content transformation function will be so common that an efficient solution for it's handling will be needed.
Another important emerging need is the ability to communicate and interact with computers using human interfaces such as speech. Speech processing and natural language processing is extremely intensive in content searching, lexical analysis, content parsing, and grammar processing. Once a voice stream has been transduced into text, speech systems need to apply large vocabularies as well as syntactic and semantic rules on the incoming text stream to understand the speech.
The emergence and growth of the worldwide web has placed tremendous computational load on information retrieval (IR) systems. Information continues to be added to the web at a high rate. This information typically gets fully indexed against an exhaustive vocabulary of words and is added to databases of search engines and IR systems. Since information is continuously being created and added, indexers need to be “always-on”. In order to provide efficient real-time contextual search, it is necessary that there be a high performance pattern-matching system for the indexing function.
Another field that stresses rich content analytics and processing is the field of bio-informatics. Gene analytics and proteomics entail the application of complex search and analysis algorithms on gene sequences and structures. Once again, such computation requires high performance search, analysis, and interpretation capability.
Thus, emerging computer and communications environments of the future will stress rich analysis and processing of content. Such environments will need efficient and programmable solutions for the following functions—searching, lexical analysis, parsing, characterization, interpretation, filtering and transformation of content in documents, messages, or packets.
Central to these rich content processing functions are (1) operations to perform contextual and content-based search, lookup, navigation, and rich associative lookup, and (2) the capability to efficiently evaluate state machines against an input data stream.
In the prior art, search and lookup processing has typically has been performed in one of two ways. First, such processing has been performed using fixed application specific integrated circuits (ASIC) solutions using a combination of content addressable memories (CAMs), comparator hardware and dedicated logic. For example, search rules are stored in a content-addressable memory, and the data is streamed across the structure, shifting it 1 byte or 1 word at a time. Alternatively, specific comparators are arranged at fixed locations to recognize specific values in the incoming data. Incidences of matches are recorded and consumed by the dedicated logic as per the requirements of the target application. Although the fixed ASIC approach can increase performance, it lacks easy programmability, and hence its application is severely restricted. Furthermore, the expense associated with designing and tailoring specific chips for each targeted solution is prohibitive.
Second, traditional general-purpose microprocessors with general-purpose execution datapaths have been used to handle rich search and lookup functions and associated content processing. Microprocessors are fully programmable devices and are able to address the evolving needs of problems—by simply reprogramming the software the new functionality can be redeployed. However, the traditional microprocessor is limited in the performance level it can offer to rich content analytics and processing.
The limitation in performance on content analytics is inherent in the design and evolution of the microprocessor architecture. The microprocessor originated as a computing unit, performing arithmetic operations on 1,2,4,8 byte words. Subsequently, as the field of computing evolved, more functionality was progressively added to the microprocessor to address emerging fields. As a result, the general purpose microprocessor is functional across a very wide range of applications, but not very well tuned for any one in particular. Fundamentally, as it applies to the needs of content analytics, the microprocessor architecture has two key limitations—(1) it lacks the capability to simultaneously perform massively parallel and fine-grain pattern-matching and comparison operations on large datasets, and (2) it lacks the capability to make rapid and multiple state transitions and efficient multi-directional control flow changes based on input data.
A number of search and pattern matching algorithms have evolved to make best use of the microprocessor. The Boyer-Moore algorithm is widely regarded as one of the best-known techniques employed on a microprocessor to find occurrences of patterns in a given data set. The algorithm processes only one pattern at a time and must be repeatedly invoked if more than one pattern is to be searched in a data set. For each pattern to be searched, it advances sequentially through the data set making selective comparisons based on observations obtained from pre-characterizing the pattern. This algorithm provides superior performance relative to other pattern matching algorithms by reducing the total number of comparisons within a given data set. However, due to the sequential nature of the algorithm, the performance is limited by fundamental constraints of microprocessor architecture, namely the scalar instruction set and the penalty incurred on branching.
Owing to the aforementioned architectural limitations of the microprocessor, the efficiency and capability of conventional microprocessors are severely challenged by the emerging computing and communications environments described earlier. Several data points can be provided to support these arguments. For example, in a Network Intrusion Detection System (NIDS) such as Snort, it is already desirable to apply signature detection of hundreds of strings on incoming packets. Performing this workload with signatures of 8-byte patterns on a 3 GHz Pentium IV processor in a commercial microprocessor-based system that employs an improved version of the Boyer-Moore pattern matching algorithm limits the packet rate to less than 50 Mbps. Likewise, parsing of XML documents on such a platform is limited to the 10 MB/s range, and speech processing is limited to 1 real-time stream on restricted grammars and vocabularies. These data points indicate that the conventional microprocessor of 2003 or 2004 will be able to deliver rich content analytics and processing at rates around the 100 Mbps range. However, by that timeframe, data rates of between 1 Gbps to 10 Gbps will not be uncommon in enterprise networks and environments. Clearly, there is a severe mismatch of one to two orders of magnitude between the performance that can be delivered by the conventional microprocessor and that which is demanded by the environment. While it is possible to employ multiple parallel microprocessor systems to execute some of the desired functions at the target rate, this greatly increases the cost of the system. There is clearly a need for a more efficient solution for these target functions.
A similar parallel exists in the case of state machine evaluation. The history of state machines dates back to early computer science. In their simplest formulation, state machines are formal models that consist of states, transitions amongst states, and an input representation. Starting with Turing's model of algorithmic computation (1936), state machines have been central to the theory of computation. In the 1950s, the regular expression was developed by Kleene as a formal notation to describe and characterize sets of strings. The finite state automaton was developed as a state machine model that was found to be equivalent to the regular expression. Non-deterministic automata were subsequently developed and proven to be equivalent to deterministic automata. Subsequent work by Thompson and others led to a body of construction algorithms for constructing finite state automata to evaluate regular expressions. A large number of references are available for descriptions of Regular Expressions and Finite State Automata. For a reference text on the material, see “Speech and Language Processing” (by Daniel Jurafsky and James H. Martin, Prentice-Hall Inc, 2000).
Using techniques available in the prior art, state machine and finite state automata processing can be performed in one of three ways. First, such processing has been performed using fixed application specific integrated circuits (ASIC) solutions that directly implement a fixed and chosen state machine that is known apriori. Although the fixed ASIC approach can increase performance, it lacks programmability, and hence its application is severely restricted. Furthermore, the expense associated with designing and tailoring specific chips for each targeted solution is prohibitive.
Second, Field Programmable Gate Arrays (FPGA) can be used to realize state machines in a programmable manner. Essentially, the FPGA architecture provides generalized programmable logic that can be configured for a broad range of applications, rather than being specially optimized for the implementation of state machines. Using this approach, one can only accommodate a small number of state machines on a chip, and furthermore the rate at which evaluation can progress is limited. The density and performance characteristics of the implementations make this choice of solution inadequate for the broad range of emerging applications.
Third, traditional general-purpose microprocessors have been used to implement a variety of state machines. Microprocessors are fully programmable devices and are able to address the evolving needs of problems—by simply reprogramming the software the new functionality can be redeployed. However, the traditional microprocessor is limited in the efficiency with which it can implement and evaluate state machines.
There is a need for a new solution for a programmable processing apparatus that is more suitable for content analytics and processing, and that is efficient on a set of functions that include state machine evaluation as well as the execution of operations for contextual search, lexical analysis, parsing, interpretation, and transformation of content on messages, packets, or documents.