This invention relates to a character identification device or a character string identification device for identifying an input character or a sequence of input characters such as text information or a communication message to produce an identified code indicative of a result of identification.
The character identification device or the character string identification device are for use in an address conversion or translation system, an expert system, an information retrieval system, a pattern recognition system, a local area network system, a machine translation system, or the like.
A most typical method for retrieving contents of the input character string such as the text information has been performed by software in computers that is called a matching process program. Various character strings such as keywords or destinations of message are preliminarily memorized in a memory as memorized character strings. Responsive to the input character string, the matching process program matches the input character string with one of the memorized character strings. The matching process program produces a match signal when the input character string coincides with the one of the memorized character strings. Otherwise, the matching process program produces a mismatch signal and matches the input character string with other memorized character strings.
Providing that an eight-character unit it called a byte, a retrieval processing rate per memorized character string is equal to one megabytes/sec when each memorized character string has an eight-character length and when time for comparison of the character string is equal to one microsecond.
The above-mentioned conventional method has been disadvantageous in that the retrieval processing rate is slow in inverse proportion to the number of the memorized character strings. For example, the retrieval processing rate is equal to four kilobytes/sec, when the number of the memorized character strings is equal to 256.
On the other hand, the retrieval processing rate is decided by a maximum length of the memorized character string when the memorized character strings have various character lengths. For example, the retrieval processing rate is 256 bytes/sec when the maximum character length is 128.
In order to reduce the disadvantage, a prior character string identification device is revealed in a prior patent application, U.S. patent application Ser. No. 720,930 filed Apr. 8, 1985, by the present applicant, based on Japanese Patent Application No. 68495 of 1984 and No. 267831 of 1984 which are published in Japanese Unexamined Patent Prepublications, as Kokai No. Syo 60-211539 (JP-A-60-211,539) and Kokai No. Syo 61-145634 (JP-A-61-145634), respectively. The character string identification device disclosed by the prior patent application comprises an associate memory, a sequential processing circuit, and a priority encoder. The associative memory has memory locations assigned with addresses and preliminarily stores a plurality of characters as memorized characters. The associative memory decides a best match between each input character and one of the memorized characters to produce a character match signal representative of the address for the above-mentioned one of the memorized characters. In the associative memory, the memory locations are classified in several location sets each being assigned with successive addresses at which only one of the character strings is memorized. The sequential processing circuit sequentially processes the character match signals produced for the respective input characters into a string match signal. The priority encoder encodes the string match signal into an encoded signal to produce the encoded signal as an identified code. The character string identification device is capable of raising the retrieval processing rate by matching the input character string with a number of memorized character strings in parallel.
Therefore, such a character string identification device is suited for large scale integration (LSI). The priority encoder, however, has a larger area in an LSI chip in proportion to the number of the memorized character strings. Accordingly, the prior character string identification device is still disadvantageous in that it is difficult to increase the number of the memorized character strings.
On the other hand, the match signal will never be produced when an error, an erroneous addition, or an erroneous omission occurs even in one character of either the input character string or the memorized character strings. In order to relieve this problem, the above-mentioned conventional method preliminarily memorizes in the memory not only the memorized character strings but also modified character strings with each modified character string formed by intentionally being introduced into the memorized character strings. Therefore, the above-mentioned conventional method has been defective in that the retrieval processing rate is extremely slow in inverse proportion to the number of the modified character strings and the lengths of the modified character strings and of the memorized character strings. When each of the memorized character strings has a length of eight characters each of which is a character code of eight bits, the number of the modified character strings per memorized character string is equal to 2.times.8.times.2.sup.8, namely, 4096. Accordingly, the retrieval processing rate goes down about one bytes/sec from the above-referenced rate of four kilobytes/sec which is attained without use of the modified character strings.
In order to remove the defect, another prior character string identification device is revealed in Japanese Unexamined Patent Prepublications, Kokai No. Syo 61-253,536 (JP-A-61-253,536) and Kokai No. Syo 61-267,130 (JP-A-61-267,130) for Japanese Patent Applications No. 96213 of 1985 and No. 108667 of 1985 filed by the present assignee from the present applicant et al. The character string identification device has a modified sequential processing circuit in addition to the associate memory. This makes it possible to raise the retrieval processing rate by matching the input character string to a number of memorized character strings and a greater number of modified character strings in parallel.
However, the modified sequential processing circuit occupies a larger area in comparison with the associative memory when the character string identification device is realized by a very-large-scale integrated circuit (VLSI) of one chip. Accordingly, the character string identification devices of the Japanese Patent Prepublications are disadvantageous in that it is difficult to increase a capacity of the association memory.