This specification relates to pattern recognition using finite automata.
Pattern recognition, or pattern matching, is used in a variety of applications, including network security. For example, network security problems can be identified by analyzing network traffic for patterns matching malicious traffic. The particular protocols used between ports of devices on a network can be determined by analyzing network traffic for patterns matching known port protocols. Pattern matching can also be used in file security analysis by matching the data of a file to patterns indicating malicious file contents.
Pattern matching can be done using finite automata (e.g., finite state machines). A finite automaton includes a number of states, transitions between the states, and particular actions corresponding to the states (e.g., determine that the input matches a pattern, determine that the input does not match a pattern, etc.). For example, the Aho-Corasick finite automaton algorithm is used to match patterns in input text strings.
Implementing a finite automaton requires storing transitions for every possible next input element that could be received for every possible current state that the finite automaton could be in. Therefore, a large amount of storage space is needed to store the transitions for a finite automaton—the storage requirements are on the order of magnitude of number of states multiplied by the number of possible input elements. This can make storing, and using, finite automata inefficient.