The present invention relates to a symbol string collating apparatus and, more particularly, to a symbol string collating apparatus for finding a specific symbol string or data from a long symbol string or data in a text retrieval system or line control system for a communication line.
A symbol string collating apparatus is utilized to extract a feature series in a pattern recognition system, to extract a key word from a text file of a text input by using a wordprocessor or the like, to support language translation, to perform protocol control or data sorting control in a communication line, to create a non-structural data base using graphic patterns, images, texts, and the like. The symbol collating apparatus is essential in formation of such an information processing system or communication system to be intellectualized.
A conventional symbol collating apparatus is operated on the basis of software sequential processing of a versatile computer and therefore requires a long processing time. For this reason, a symbol string which can be collated is limited to a short symbol string or a structured symbol string which is delimited in units of words.
An operation of finding a location of a pattern including m symbol strings in a text including n symbol strings will be described below as an example.
In this case, collating of m symbol strings must be performed (n-m+1) times. For example, in order to find m=10.sup.3 character strings from a text including n=10.sup.9 character strings, symbol string collating processing must be performed about 10.sup.12 times. Therefore, since retrieval based on large-capacity source information of, e.g., texts, images, graphic patterns, or sounds is impractical, a key word is added to the source information beforehand to perform retrieval, or data structured into a table format is retrieved.
In order to solve the above problem, a method of directly collating a symbol string by using an associative memory (Japanese Patent Laid-Open Nos. 61-28132 and 61-28133) and a symbol string collating apparatus (Japanese Patent Laid-Open No. 61-95442) have been proposed.
A typical arrangement of these symbol string collating apparatuses will be described below.
These symbol string collating apparatuses store collating symbol strings in an associative memory and compare the collating symbol strings with externally, sequentially input symbol strings to be collated. If these symbol strings coincide with each other, a coincidence signal is output for each bit of the collating symbol string. A register array is constituted by registers each for storing a collated state of each bit. An input terminal of each collated state memory register of the register array is connected to each coincidence signal output terminal of the associative memory via a logical operator. Each collated state memory register corresponding to each bit stores "coincidence" only when an immediately preceding input symbol coincides with a symbol corresponding to an adjacent upper bit of the collating symbol string and a current input symbol coincides with a symbol corresponding to this adjacent upper bit of the collating symbol string. Therefore, when the input symbol string to be collated perfectly coincides with the collating symbol string, a collated state memory register corresponding to the last bit of the collating symbol string of the register array stores "coincidence".
In such a symbol string collating apparatus, collating processing can be performed by only sequentially supplying symbol strings to be collated to an address input of the associative memory. Therefore, high-speed symbol string collating processing can be realized. In addition, since a connection between the registers constituting the register array can be changed in accordance with the length and structure of the collating symbol string, collating can be flexibly performed for various symbol strings.
The above conventional symbol string collating apparatus requires registers for storing collated states of symbol strings, registers representing the lengths of the collating symbol strings, and a large number of logical gates for connecting the registers. These registers and the logical gates require an area 10 to 20 times larger than memory cells. That is, the conventional symbol string collating apparatus requires a number of large elements. Therefore, when the conventional apparatus is formed into an LSI arrangement, a chip size is increased, and cost is increased and reliability is degraded accordingly.
In addition, in the method of directly performing symbol string collating by using an associative memory, it is difficult to collate a variable-length symbol string.