A beneficial application of a memory circuit for character recognition automaton relates to forming a database for an Aho-Corasick type final state deterministic character recognition automaton for implementation of recognition of multiple information (MPR), also known by the term multi-pattern recognition. For example, but not exclusively, the recognition of characters may be used in the field of computing to recognize signatures of computer viruses or in intrusion detection systems in which known attack signatures are detected.
Referring to FIG. 1, a character recognition automaton MPR is therefore based on the use of a database B in which a list of words is stored, or in a general manner, a list of sequences of bytes to be recognized in an incoming file F is stored. The patterns are stored in the database B in the form of a node tree in which each node corresponds to a sequence of bytes of a pattern to be recognized, and in which each node corresponds to a state of the automaton.
The structure and implementation of an Aho-Corasick type automaton are well known to the person skilled in the art, and are therefore, not described in detail below. In this regard, reference may be made to the article “A. Aho and M. Corasick: Efficient String Machine, An Aid to Bibliographic Search.” In Communications of the ACM, 18 (6): 333-340, 1975.
Construction of an Aho-Corasick automaton first requires devising the database B by providing, for each pattern to be recognized, the states and the direct transitions which lead to the recognition of the pattern. A transition function is then calculated which points to a consecutive node which corresponds to the recognition of a character in a given state. The search for a pattern in a file to be analyzed F, such as a video file or a text file, is performed thereafter by traversing the graph constructed When a final state is attained, the corresponding pattern is declared retrieved.
Devising the database is a very complex step to implement and requires relatively significant hardware, particularly in terms of memory. Thus, in an application for searching for computer virus signatures, the size of the memory required to implement an Aho-Corasick type automaton may attain, or even exceed, 100 Mb.
Furthermore, Aho-Corasick type shape and character recognition automaton are generally installed in software form so that the transfer of the data to and from the database is performed by buses, thereby requiring a considerable transfer time.