This invention relates in general to digital processing and more specifically to data lookup and indexing in a digital processing system.
A Deterministic Finite Automaton (DFA), or Finite State Machine (FSM), is a useful approach to solve many data processing tasks. For example, searching for a sequence, or “string” of characters for purposes of word searching in a document, password matching, etc., is often implemented with a DFA. Common DFA algorithms include Knuth-Morris-Pratt (KMP) and Boyer-Moore (BM). Essentially, these algorithms scan the string one character at a time and enter a “state” depending on the character presently input and the past input characters. The scan either results in a match if the string is detected before there are no more input characters to check, or else a non-match is indicated.
Computer processes or tasks that use the DFA approach often must operate in very demanding conditions. Speed is usually critical, especially where large amounts of text or other data must be scanned. For this reason the DFA must operate quickly. In other applications, memory, power, bandwidth or other limitations to processing resources can require the DFA to operate with as little storage as possible. This usually means that the data structures used by the DFA must be made as compact as possible.
Usually the two goals of compactness and speed work against each other. For example, one approach to reducing the size of DFA data structures is to require multiple levels of indirection or lookups, such as looking up an entry in a table or indexing into an array. However, each lookup requires processing cycles and results in a slower DFA. Data structures can often be compressed but that may require decompression, decoding or other computation on-the-fly for the DFA to be able to use the data.
One attempt at optimizing DFA data structures and processing is described, for example, in Aho, A. V., Sethi, R., Ullman, J. D. Compilers: Principles, Techniques, and Tools. Addison-Wesley. 1986. (pp 144-146). Many different approaches exist for such optimizations and the nature of computing is such that even a very small or marginal improvement in size of a data structure or datum, or increase in speed of operation of one or a few operations can provide a very significant overall improvement of the operation of a process using a DFA approach.