The present invention generally relates to variable length character string detection apparatuses, and more particularly to a variable length character string detection apparatus which enters a document file stored in a secondary storage and collates a character strings with a registered character string so as to detect the registered character string and erroneous character strings from the document file (text).
As conventional character string detection methods, there are the (1) sort/search method, (2) associative memory method, (3) cellular array method, (4) finite state automaton method, (5) dynamic programming method and the like. However, these methods suffer the following disadvantages. That is, the methods (2) and (3) cannot process long character strings, the methods (1), (2), (3) and (5) cannot make a non-anchor matching of variable length character strings, the method (5) has a slow processing speed, and the methods (3), (4) and (5) cannot realize a small hardware size.
On the other hand, there is a character string search large scale integrated circuit (LSI) proposed by Nippon Electric Co., Ltd. of Japan which combines the methods (2) and (4). This character string search LSI (intelligent string search processor or ISSP) is discussed in Takahashi et al., "Architecture of String Matching Hardware", Denshi Tsushin Gakkai Kenkyu Hokoku (computer system), CPSY 86-57, July 1986. But even in this case, the scale of the apparatus is proportional to a tolerable number of erroneous characters. For this reason, the realized character string search LSI has a tolerable number of erroneous characters which is only in the order of one erroneous character due to the restriction of the hardware structure.