Significant trends in computing and communications are leading to the emergence of environments that abound in content analytics and processing. These environments require high performance as well as programmability on a certain class of functions, namely searching, parsing, analysis, interpretation, and transformation of content in messages, documents, or packets. Notable fields that stress such rich content analytics and processing include content-aware networking, content-based security systems, surveillance, distributed computing, wireless communication, human interfaces to computers, information storage and retrieval systems, content search on the semantic web, bio-informatics, and others.
The field of content-aware networking requires searching and inspection of the content inside packets or messages in order to determine where to route or forward such packages and messages. Such inspection has to be performed on in-flight messages at “wire-speed”, which is the data-rate of the network connection. Given that wire rates in contemporary networks range from 100 Mbits/second all the way to 40 Gbits/second, there is tremendous pressure on the speed at which the content inspection function needs to be performed.
Content-based security systems and surveillance and monitoring systems are required to analyze the content of messages or packets and apply a set of rules to determine whether there is a security breach or the possibility of an intrusion. Typically, on modern network intrusion detection systems (NIDS), a large number of patterns, rules, and expressions have to be applied to the input payload at wire speed to ensure that all potential system vulnerabilities are uncovered. Given that the network and computing infrastructure is continuously evolving, fresh vulnerabilities continue to arise. Moreover, increasingly sophisticated attacks are employed by intruders in order to evade detection. Intrusion detection systems need to be able to detect all known attacks on the system, and also be intelligent enough to detect unusual and suspicious behavior that is indicative of new attacks. All these factors lead to a requirement for both programmability as well as extremely high performance on content analysis and processing.
With the advent of distributed and clustered computing, tasks are now distributed to multiple computers or servers that collaborate and communicate with one another to complete the composite job. This distribution leads to a rapid increase in computer communication, requiring high performance on such message processing. With the emergence of XML (Extensible Markup Language) as the new standard for universal data interchange, applications communicate with one another using XML as the “application layer data transport”. Messages and documents are now embedded in XML markup. All message processing first requires that the XML document be parsed and the relevant content extracted and interpreted, followed by any required transformation and filtering. Since these functions need to be performed at a high message rate, they become computationally very demanding.
With the growth of untethered communication and wireless networks, there is an increase in the access of information from the wireless device. Given the light form factor of the client device, it is important that data delivered to this device be filtered and the payload be kept small. Environments of the future will filter and transform XML content from the wireline infrastructure into lightweight content (using the Wireless Markup Language or WML) on the wireless infrastructure. With the increasing use of wireless networks, this content transformation function will be so common that an efficient solution for it's handling will be needed.
Another important emerging need is the ability to communicate and interact with computers using human interfaces such as speech. Speech processing and natural language processing is extremely intensive in content searching, lexical analysis, content parsing, and grammar processing. Once a voice stream has been transduced into text, speech systems need to apply large vocabularies as well as syntactic and semantic rules on the incoming text stream to understand the speech.
The emergence and growth of the worldwide web has placed tremendous computational load on information retrieval (IR) systems. Information continues to be added to the web at a high rate. This information typically gets fully indexed against an exhaustive vocabulary of words and is added to databases of search engines and IR systems. Since information is continuously being created and added, indexers need to be “always-on”. In order to provide efficient real-time contextual search, it is necessary that there be a high performance pattern-matching system for the indexing function.
Another field that stresses rich content analytics and processing is the field of bio-informatics. Gene analytics and proteomics entail the application of complex search and analysis algorithms on gene sequences and structures. Once again, such computation requires high performance search, analysis, and interpretation capability.
Thus, emerging computer and communications environments of the future will stress rich analysis and processing of content. Such environments will need efficient and programmable solutions for the following functions—searching, lexical analysis, parsing, characterization, interpretation, filtering and transformation of content in documents, messages, or packets.
Central to these rich content processing functions are operations to perform contextual and content-based search and navigation, and rich associative lookup.
In the prior art, search and lookup processing has typically has been performed in one of two ways. First, such processing has been performed using fixed application specific integrated circuits (ASIC) solutions using a combination of content addressable memories (CAMs), comparator hardware and dedicated logic. For example, search rules are stored in a content-addressable memory, and the data is streamed across the structure, shifting it 1 byte or 1 word at a time. Alternatively, specific comparators are arranged at fixed locations to recognize specific values in the incoming data. Incidences of matches are recorded and consumed by the dedicated logic as per the requirements of the target application. Although the fixed ASIC approach can increase performance, it lacks easy programmability, and hence its application is severely restricted. Furthermore, the expense associated with designing and tailoring specific chips for each targeted solution is prohibitive.
Second, traditional general-purpose microprocessors have been used to handle rich search and lookup functions and associated content processing. Microprocessors are fully programmable devices and are able to address the evolving needs of problems—by simply reprogramming the software the new functionality can be redeployed. However, the traditional microprocessor is limited in the performance level it can offer to rich content analytics and processing.
The limitation in performance on content analytics is inherent in the design and evolution of the microprocessor architecture. The microprocessor originated as a computing unit, performing arithmetic operations on 1,2,4,8 byte words. Subsequently, as the field of computing evolved, more functionality was progressively added to the microprocessor to address emerging fields. As a result, the general purpose microprocessor is functional across a very wide range of applications, but not very well tuned for any one in particular. Fundamentally, as it applies to the needs of content analytics, the microprocessor architecture has two key limitations—(1) it lacks the capability to simultaneously perform massively parallel and fine-grain pattern-matching and comparison operations on large datasets, and (2) it lacks the capability to make rapid and multiple state transitions and efficient multi-directional control flow changes based on input data.
The instruction set of the microprocessor is a scalar instruction set, such that instructions need to be executed in a single ordered sequence. The instruction sets of typical microprocessors enable the comparison of a single 64-bit quantity stored in a register with another 64-bit quantity stored in a different register. The comparison is performed with the two operands aligned. If the comparison is being performed for the purpose of a pattern search, then it needs to be invoked repeatedly after shifting one or both of the operands by a variable number of bytes each time. Often, such repeated shifting is performed in a loop with a control flow change that transfers control from the code at the bottom of the loop to the code at the top of the loop on each iteration. Control flow changes in the microprocessor are accomplished by branching to a fresh sequence of code. Since modern microprocessors are highly pipelined (of the order of 20-30 stages in products like the Pentium III and Pentium IV processors from Intel Corporation of Santa Clara, Calif.), the performance penalty incurred due to branching is significant. The entire microprocessor pipeline needs to be flushed on a taken branch. Sophisticated branch prediction techniques hence need to be applied on such processors to keep the pipeline sufficiently fed with instructions from the desired path in the wake of control flow changes. However, most branch prediction techniques provide only empirical and statistical performance improvements, such that control flow changes for the most part introduce a slowdown as well as non-determinism in the performance level that can be delivered.
A number of search and pattern matching algorithms have evolved to make best use of the microprocessor. The Boyer-Moore algorithm is widely regarded as one of the best-known techniques employed on a microprocessor to find occurrences of patterns in a given data set. The algorithm processes only one pattern at a time and must be repeatedly invoked if more than one pattern is to be searched in a data set. For each pattern to be searched, it advances sequentially through the data set making selective comparisons based on observations obtained from pre-characterizing the pattern. This algorithm provides superior performance relative to other pattern matching algorithms by reducing the total number of comparisons within a given data set. However, due to the sequential nature of the algorithm, the performance is limited by fundamental constraints of microprocessor architecture, namely the scalar instruction set and the penalty incurred on branching.
Owing to the aforementioned architectural limitations of the microprocessor, the efficiency and capability of conventional microprocessors are severely challenged by the emerging computing and communications environments described earlier. Several data points can be provided to support these arguments. For example, in a Network Intrusion Detection System (NIDS) such as Snort, it is already desirable to apply signature detection of hundreds of strings on incoming packets. Performing this workload with signatures of 8-byte patterns on a 3 GHz Pentium IV processor in a commercial microprocessor-based system that employs an improved version of the Boyer-Moore pattern matching algorithm limits the packet rate to less than 50 Mbps. Likewise, parsing of XML documents on such a platform is limited to the 10 MB/s range, and speech processing is limited to 1 real-time stream on restricted grammars and vocabularies. These data points indicate that the conventional microprocessor of 2003 or 2004 will be able to deliver rich content analytics and processing at rates around the 100 Mbps range. However, by that timeframe, data rates of between 1 Gbps to 10 Gbps will not be uncommon in enterprise networks and environments. Clearly, there is a severe mismatch of one to two orders of magnitude between the performance that can be delivered by the conventional microprocessor and that which is demanded by the environment. While it is possible to employ multiple parallel microprocessor systems to execute some of the desired functions at the target rate, this greatly increases the cost of the system. There is clearly a need for a more efficient solution for these target functions.
There is a need for a new solution for a programmable processing apparatus that is more suitable for content analytics and processing, and that is efficient on a set of functions that include contextual search, lexical analysis, parsing, interpretation, and transformation of content on messages, packets, or documents.