1. Field of the Invention
This invention relates to an information recognition apparatus for recognizing a series of pieces of information such as an address or customer transaction data which is composed of a plurality of information elements for each of which a predetermined number of words each of which can make an information element are determined making use of a character recognition technique or a speech recognition technique.
2. Description of the Related Art
An address, customer transaction data or the like is composed of a plurality of information elements. For example, an address is composed of such elements as an urban or rural prefecture, a municipal district name, a street name, a square (block, sub-block and house numbers), a building name, a room number and so forth, and customer transaction data is composed of a customer number, an individual name and so forth. Further, for an address, customer transaction data or the like, a predetermined number of element words which may possibly make such elements are determined for each element. For example, the element words which may possibly make an urban or rural prefecture name in Japan are totalling 47 words such as Tokyo-to, Hokkai-do, Osaka-fu and Akita-ken.
If it is tried to recognize an address, customer transaction data or the like using a character recognition technique or a speech recognition technique, with any recognition technique available at present, it is impossible to recognize all words correctly and uniquely. Further, some element word may possibly be omitted when data are inputted. Accordingly, if words obtained as a result of recognition are merely outputted, some error in recognition or lack of some element may possibly occur.
Thus, it is a common practice to collate a result of recognition with data registered in advance to raise the accuracy in recognition. One of such systems is disclosed, for example, in Japanese Patent Laid-Open Application No. Heisei 1-113865 wherein, for all customers, customer transaction data including an account number and a customer name written in predetermined places of a cutform upon transaction by a customer are stored in advance into a customer information storage unit, and in order to recognize customer transaction data written on another cutform by the customer later, an account number and a customer name written on the cutform are recognized using a hand written character recognition technique, and then a result of the recognition is compared with the customer transaction data of all customers stored in the customer information storage unit to detect all customer transaction data likelihoods. Thereafter, customer transaction data to be determined as a recognition result is determined based on the likelihoods of the customer transaction data.
Another system is disclosed in Japanese Patent Laid-Open Application No. Heisei 4-328692 wherein elements which are paired with each other such as an individual name and `kana`-letters attached to Chinese letters of the individual name are registered in a word dictionary unit, and in order to recognize an individual name and attached `kana`-letters written in a predetermined place or places (frame or frames), a plurality of candidate characters and all of the pairs registered in the word dictionary unit are collated with each other to detect likelihoods of the pairs registered in the word dictionary unit. Then, the candidate characters are registered into a candidate word table in a descending order of the likelihood.
In the systems described above, information of all of actually existing recognition objects each of which can be represented by a combination of element words is stored in advance in a storage unit, and when to recognize information of a recognition object, a result of recognition by a character recognition technique is collated with all of the recognition object information stored in advance in the storage unit to calculate likelihoods of the recognition object information. Further, the two systems described above presuppose that elements of recognition object information are written in a predetermined column or frame.
The systems described above have the following problems.
First, the systems described above cannot be applied to an application wherein kinds of element words are not designated by a column or frame in advance. For example, when it is tried to recognize a freely written character train such as an address on a mail matter or to recognize an address or the like based on a speech recognition technique, neither the kinds of element words nor the character punctuations or word punctuations are settled with the address or the like. Accordingly, if it is tried to apply the systems to such application, it is required to assume all character punctuations, word punctuations and kinds of elements and effect collation processing of all possible combinations of them with all information stored in the storage unit. This requires a large amount of processing and is not therefore practical.
Second, with the systems described above, since a result of recognition of an element word is compared directly with element words in the storage unit, where the same word appears several times in the storage unit, the same likelihood calculation processing is performed several times accordingly. Thus, the systems are disadvantageous in that the efficiency is low.
Third, several elements have different representations. For example, in regard to an address, for representation of the name of a place by characters, such different representations as "" (Tsukuba-shi) and "" (Tsukuba-shi) are possible, and for representation of a square such as a block number, a sub-block number and a house number, numerals of "kanji" (Chinese character) and Arabic numerals are used. In order for the systems described above to allow use of such different representations, it is required to store all of possible representations in the storage unit. This requires a large storage capacity for the storage unit and deteriorates the efficiency in likelihood calculation processing very much.