This invention relates generally to content search, storage and networking semiconductors and in particular to high performance content search, network storage and security processors that can be used within networking, storage, security, bioinformatics, chipsets, servers, search engines and the like.
Many modern applications depend on fast information search and retrieval. With the advent of the world-wide-web and the phenomenal growth in its usage, content search has become a critical capability. A large number of servers get deployed in web search applications due to the performance limitations of the state of the art microprocessors for regular expression driven search.
There have been significant research and development resources devoted to the topic of searching of lexical information or patterns in strings. Regular expressions have been used extensively since the mid 1950s to describe the patterns in strings for content search, lexical analysis, information retrieval systems and the like. Regular expressions were first studied by S. C. Kleene in mid-1950s to describe the events of nervous activity. It is well understood in the industry that regular expression (RE) can also be represented using finite state automata (FSA). Non-deterministic finite state automaton (NFA) and deterministic finite state automaton (DFA) are two types of FSAs that have been used extensively over the history of computing. Rabin and Scott were the first to show the equivalence of DFA and NFA as far as their ability to recognize languages in 1959. In general a significant body of research exists on regular expressions. Theory of regular expressions can be found in “Introduction to Automata Theory, Languages and Computation” by Hopcroft and Ullman and a significant discussion of the topics can also be found in book “Compilers: Principles, Techniques and Tools” by Aho, Sethi and Ullman.
Internet protocol (IP) is the most prevalent networking protocol deployed across various networks like local area networks (LANs), metro area networks (MANs) and wide area networks (WANs). Storage area networks (SANs) are predominantly based on Fibre Channel (FC) technology. There is a need to create IP based storage networks.
When transporting block storage traffic on IP designed to transport data streams, the data streams are transported using Transmission Control Protocol (TCP) that is layered to run on top of IP. TCP/IP is a reliable connection/session oriented protocol implemented in software within the operating systems. TCP/IP software stack is very slow to handle the high line rates that will be deployed in future. Currently, a 1 GHz processor based server running TCP/IP stack, with a 1 Gbps network connection, would use 50-70% or more of the processor cycles, leaving minimal cycles available for the processor to allocate to the applications that run on the server. This overhead is not tolerable when transporting storage data over TCP/IP as well as for high performance IP networks. Hence, new hardware solutions would accelerate the TCP/IP stack to carry storage and network data traffic and be competitive to FC based solutions. In addition to the TCP protocol. other protocols such as SCTP and UDP protocols can be used, as well as other protocols appropriate for transporting data streams.
Computers are increasingly networked within enterprises and around the world. These networked computers are changing the paradigm of information management and security. Vast amounts of information, including highly confidential, personal and sensitive information is now being generated, accessed and stored over the network, which information needs to be protected from unauthorized access. Further, there is a continuous onslaught of spam, viruses, and other inappropriate content on the users through email, web access, instant messaging, web download and other means, resulting in significant loss of productivity and resources.
Enterprise and service provider networks are rapidly evolving from 10/100 Mbps line rates to 1 Gbps, 10 Gbps and higher line rates. Traditional model of perimeter security to protect information systems pose many issues due to the blurring boundary of an organization's perimeter. Today as employees, contractors, remote users, partners and customers require access to enterprise networks from outside, a perimeter security model is inadequate. This usage model poses serious security vulnerabilities to critical information and computing resources for these organizations. Thus the traditional model of perimeter security has to be bolstered with security at the core of the network. Further, the convergence of new sources of threats and high line rate networks is making software based perimeter security to stop the external and internal attacks inadequate. There is a clear need for enabling security processing in hardware inside core or end systems beside a perimeter firewall as one of the prominent means of security to thwart ever increasing security breaches and attacks.
FBI and other leading research institutions have reported in recent years that over 70% of intrusions in organizations have been internal. Hence a perimeter defense relying on protecting an organization from external attacks is not sufficient as discussed above. Organizations are also required to screen outbound traffic to prevent accidental or malicious disclosure of proprietary and confidential information as well as to prevent its network resources from being used to proliferate spam, viruses, worms and other malware. There is a clear need to inspect the data payloads of the network traffic to protect and secure an organization's network for inbound and outbound security.
Data transported using TCP/IP or other protocols is processed at the source, the destination or intermediate systems in the network or a combination thereof to provide data security or other services like secure sockets layer (SSL) for socket layer security, Transport layer security, encryption/decryption, RDMA, RDMA security, application layer security, virtualization or higher application layer processing, which may further involve application level protocol processing (for example, protocol processing for HTTP, HTTPS, XML, SGML, Secure XML, other XML derivatives, Telnet, FTP, IP Storage, NFS, CIFS, DAFS, and the like). Many of these processing tasks put a significant burden on the host processor that can have a direct impact on the performance of applications and the hardware system. Hence, some of these tasks need to be accelerated using dedicated hardware for example SSL, or TLS acceleration. As the usage of XML increases for web applications, it is expected to put a significant performance burden on the host processor and would also benefit significantly from hardware acceleration. Detection of spam, viruses and other inappropriate content require deep packet inspection and analysis. Such tasks can put huge processing burden on the host processor and can substantially lower network line rate. Hence, deep packet content search and analysis hardware is also required.
Internet has become an essential tool for doing business at small to large organizations. HTML based static web is being transformed into a dynamic environment over last several years with deployment of XML based services. XML is becoming the lingua-franca of the web and its usage is expected to increase substantially. XML is a descriptive language that offers many advantages by making the documents self-describing for automated processing but is also known to cause huge performance overhead for best of class server processors. Decisions can be made by processing the intelligence embedded in XML documents to enable business to business transactions as well as other information exchange. However, due to the performance overload on the best of class server processors from analyzing XML documents, they cannot be used in systems that require network line rate XML processing to provide intelligent networking. There is a clear need for acceleration solutions for XML document parsing and content inspection at network line rates which are approaching 1 Gbps and 10 Gbps, to realize the benefits of a dynamic web based on XML services.
Regular expressions can be used to represent the content search strings for a variety of applications like those discussed above. A set of regular expressions can then form a rule set for searching for a specific application and can be applied to any document or stream of data for examination of the same. Regular expressions are used in describing anti-spam rules, anti-virus rules, XML document search constructs and the like. These expressions get converted into NFAs or DFAs for evaluation on a general purpose processor. However, significant performance and storage limitations arise for each type of the representation. For example an N character regular expression can take up to the order of 2N memory for the states of a DFA, while the same for an NFA is in the order of N. On the other hand the performance for the DFA evaluation for an M byte input data stream is in the order of M memory accesses and the order of (N*M) processor cycles for the NFA representation on modern microprocessors.
When the number of regular expressions increases, the impact on the performance deteriorates as well. For example, in an application like anti-spam, there may be hundreds of regular expression rules. These regular expressions can be evaluated on the server processors using individual NFAs or DFAs. It may also be possible to create a composite DFA to represent the rules. Assuming that there are X REs for an application, then a DFA based representation of each individual RE would result up to the order of (X*2N) states however the evaluation time would grow up to the order of (X*N) memory cycles. Generally, due to the potential expansion in the number of states for a DFA they would need to be stored in off chip memories. Using a typical access time latency of main memory systems of 100 ns, it would require about (X*100 ns*N*M) time to process an X RE DFA with N states over an M byte data stream. This can result in tens of Mbps performance for modest size of X, N & M. Such performance is obviously significantly below the needs of today's network line rates of 1 Gbps to 10 Gbps. On the other hand, if a composite DFA is created, it can result in an upper bound of storage in the order of 2N*X which may not be within physical limits of memory size for typical commercial computing systems even for a few hundred REs. Thus the upper bound in memory expansion for DFAs can be a significant issue. Then on the other hand NFAs are non-deterministic in nature and can result in multiple state transitions that can happen simultaneously. NFAs can only be processed on a state of the art microprocessor in a scalar fashion, resulting in multiple executions of the NFA for each of the enabled paths. X REs with N characters on average can be represented in the upper bound of (X*N) states as NFAs. However, each NFA would require M iterations for an M-byte stream, causing an upper bound of (X*N*M*processor cycles per loop). Assuming the number of processing cycles are in the order of 10 cycles, then for a best of class processor at 4 GHz, the processing time can be around (X*N*M*2.5 ns), which for a nominal N of 8 and X in tens can result in below 100 Mbps performance. There is a clear need to create high performance regular expression based content search processors which can provide the performance in line with the network rates which are going to 1 Gbps and 10 Gbps.
The methods for converting a regular expression to NFA and DFA are well known. The resulting automata are able to distinguish whether a string belongs to the language defined by the regular expression however it is not very efficient to figure out if a specific sub-expression of a regular expression is in a matching string or the extent of the string. Tagged NFAs enable such queries to be conducted efficiently without having to scan the matching string again. For a discussion on Tagged NFA please refer to the paper “NFAs with Tagged Transitions, their Conversion to Deterministic Automata and Application to Regular Expressions”, by Ville Laurikari, Helsinki University of Technology, Finland.
U.S. Patent Applications, 20040059443 and 20050012521, describe a method and apparatus for efficient implementation and evaluation of state machines and programmable finite state automata. These applications show an apparatus that is used to evaluate regular expressions using an array of NFAs to create high performance processing of regular expressions. The application recognizes the upper bound in the storage issues for DFAs as a reason to implement regular expressions using NFAs. However, the applications fails to recognize that even though the DFA worst case storage requirement is substantially higher compared to NFAs many DFAs have less storage needs than NFAs. DFAs for many regular expressions can result in lower number of states compared to an NFA. For example in an anti-spam application, based on the open source tool SpamAssassin, a large number of the regular expression rules result in DFAs which are smaller than NFAs. Hence, it is important not to ignore DFA implementation based only on the worst case scenario. These patent applications also create NFA engines that process a single RE per NFA block. Thus if a RE uses fewer states than the minimum states of the NFA block, there is no provision to be able to use multiple REs simultaneously in the same block. In my invention, I describe a content search processor which uses an array of runtime adaptable search engines, where the search engines may be runtime adaptable DFA search engines or runtime adaptable NFA search engines or a combination thereof to evaluate regular expressions. Content search engine of my search processor also provides flexibility of using multiple REs per NFA or DFA engine. My invention also provides capabilities to support Tagged NFA implementations which are not supported or discussed in these applications. Further, these applications do not address the need of dynamically configuring the hardware or the rules being applied based on the transported data being sent to or received from a network. The processors of my invention can be dynamically adapted to apply hardware based rule sets dependent on the transported data which is not described in the above applications. Further, my invention shows that certain DFAs can be more hardware resource efficient to implement compared to NFAs and can enable today's state of the art FPGAs to implement a large number of regular expressions without having to devote large investments in creating application specific integrated circuits using advanced process technologies. This is also specifically discussed as not feasible to do in the above applications. My invention also shows content search acceleration can be used to improve application acceleration through content search application programmer interface (API) and the search processor of this invention.
Hardware acceleration for each type of network data payload can be expensive when a specialized accelerator is deployed for each individual type of network data. There is a clear need for a processor architecture that can adapt itself to the needs of the network data providing the necessary acceleration and thereby reduce the impact on the host performance. This patent describes such a novel architecture which adapts itself to needs of the network data. The processor of this patent can be reused and adapted for differing needs of the different types of the payload and still offer the benefits of hardware acceleration. This can have a significant reduction in the cost of the acceleration solutions deployment compared to dedicated application-specific accelerators.
Dynamically reconfigurable computing has been an area that has received significant research and development interest to address the need of reconfiguring hardware resources to suit application needs. The primary focus of the research has been towards creating general purpose microprocessor alternatives that can be adapted with new instruction execution resources to suit application needs.
Field programmable gate arrays (FPGA) have evolved from simple AND-OR logic blocks to more complex elements that provide a large number of programmable logic blocks and programmable routing resources to connect these together or to Input/Output blocks. U.S. Pat. No. 5,600,845 describes an integrated circuit computing device comprising a dynamically configurable FPGA. The gate array is configured to create a RISC processor with a configurable instruction execution unit. This dynamic re-configurability allows the dynamically reconfigurable instruction execution unit to be changed to implement operations in hardware which may be time consuming to run in software. Such an arrangement requires a preconfigured instruction set to execute the incoming instruction and if an instruction is not present it has to be treated as an exception which then has a significant processing overhead. The invention in U.S. Pat. No. 5,600,845 addresses the limitation of general purpose microprocessors but does not address the need of dynamically configuring the hardware based on the transported data being sent to or received from a network.
U.S. Patent Application number 20030097546 describes a reconfigurable processor which receives an instruction stream that is inspected by a instruction test module to decide if the instruction is supported by existing non reconfigurable hardware or the reconfigurable hardware configured by a software routine and executes the instruction stream based on the test result. If the instruction is not supported then the processor decides a course of action to be taken including executing the instruction stream in software. The patent application number 20030097546 also does not address the need of dynamically configuring the hardware based on the transported data being sent to or received from a network.
U.S. Patent Application number 20040019765 describes a pipelined reconfigurable dynamic instruction set processor. In that application, dynamically reconfigurable pipeline stages under control of a microcontroller are described. This is yet another dynamically reconfigurable processor that can adapt its pipeline stages and their interconnections based on the instructions being processed as an alternative to general purpose microprocessors.
The field of reconfigurable computing has been ripe with research towards creating dynamically reconfigurable logic devices either as FPGAs or reconfigurable processors as described above as primarily addressing the limitations of general purpose processors by adding reconfigurable execution units or reconfigurable coprocessors. For example, “Reconfigurable FPGA processor”, diploma thesis paper by Andreas Romer from Swiss Federal Institute of Technology, targets the need of creating an ASIC-like performance and area, but general purpose processor level flexibility, by dynamically creating execution functional units in a reconfigurable part of a reconfigurable FPGA like Xilinx Virtex and XC6200 devices. Similarly, the paper by J. R. Hauser and J Wawrzynek entitled Garp: A MIPS Processor With a Reconfigurable Coprocessor published in Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM '97), targets the need for creating custom co-processing support to a MIPS processor addressing the limitations of the general purpose processing capabilities of the MIPS processor.
Published research or patent applications have not addressed the need of dynamically configuring the hardware based on transported data as well as actions to be taken and applications/services to be deployed for that specific data being sent to or received from a network. This patent describes a novel architecture which adapts itself to the needs of the network data and is run-time adaptable to perform time consuming security policy operations or application/services or other data processing needs of the transported data and defined policies of the system incorporating this invention. The architecture also comprises a deep packet inspection engine that may be used for detecting spam, viruses, digital rights management information, instant message inspection, URL matching, application detection, malicious content, and other content and applying specific rules which may enable anti-spam, anti-virus and the like capabilities.