This invention relates generally to memory technology and in particular to a new high performance intelligent content search memory.
Many modern applications depend on fast information search and retrieval. With the advent of the world-wide-web and the phenomenal growth in its usage, content search has become a critical capability. A large number of servers get deployed in web search applications due to the performance limitations of the state of the art microprocessors for regular expression driven search.
There have been significant research and development resources devoted to the topic of searching of lexical information or patterns in strings. Regular expressions have been used extensively since the mid 1950s to describe the patterns in strings for content search, lexical analysis, information retrieval systems and the like. Regular expressions were first studied by S. C. Kleene in mid-1950s to describe the events of nervous activity. It is well understood in the industry that regular expression (RE) can also be represented using finite state automata (FSA). Non-deterministic FSA (NFA) and deterministic FSA (DFA) are two types of FSAs that have been used extensively over the history of computing. Rabin and Scott were the first to show the equivalence of DFA and NFA as far as their ability to recognize languages in 1959. In general a significant body of research exists on regular expressions. Theory of regular expressions can be found in “Introduction to Automata Theory, Languages and Computation” by Hoperoft and Ullman and a significant discussion of the topics can also be found in book “Compilers: Principles, Techniques and Tools” by Aho, Sethi and Ullman.
Regular expressions (RE) can be used to represent the content search strings for a variety of applications. A set of regular expressions can then form a rule set for searching for a specific application and can be applied to any document, file, message, packet or stream of data for examination of the same. Regular expressions are used in describing anti-spam rules, anti-virus rules, intrusion detection and intrusion prevention rules, anti-spyware rules, anti-phishing rules, extrusion detection rules, digital rights management rules, legal compliance rules, worm detection rules, instant message inspection rules, VOIP security rules, XML document security and search constructs, genetics, proteomics, XML based protocols like XMPP, web search, database search, bioinformatics, signature recognition, speech recognition, web indexing and the like. These expressions get converted into NFAs or DFAs for evaluation on a general purpose processor. However, significant performance and storage limitations arise for each type of the representation. For example an N character regular expression can take up to the order of 2n memory for the states of a DFA, while the same for an NFA is in the order of N. On the other hand the performance for the DFA evaluation for an M byte input data stream is in the order of M memory accesses and the order of (N*M) processor cycles for the NFA representation on modern microprocessors.