1. Field of the Invention
This invention relates to a character recognition apparatus, and more particularly to an apparatus which reads an address using a character recognition technique.
2. Description of the Related Art
An address is composed of element words representing an urban or rural prefecture, a municipal district name, a street name, and a square (block, sub-block and house numbers), and further, in the case of a multiple dwelling house, a building name, a unit number. If the element words are determined uniquely, then an address can be defined uniquely.
However, with the character recognition technique at present, it is actually impossible to recognize all element words correctly and uniquely. Therefore, in reading of an address based on a character recognition technique, it is difficult to read an address correctly only if results of character recognition are outputted as they are in a train.
Against such a problem as just described, a system has been proposed wherein the reading accuracy is augmented by extracting element words which compose an address from results of character recognition and comparing the words with general rules of address composition (this will be hereinafter referred to as "prior art 1"). The prior art 1 is disclosed, for example, in a document (1), "OCR Address Reading/Letter Sorting Machine for the Ministry of Posts and Telecommunications of Japan", NEC Technical Report, Vol. 44, No. 3, pp. 25-30, another document (2), "Japanese Address Reader-Sorter, Model TR-17", Toshiba Review, Vol. 45, No. 2, pp. 149-152, and so forth.
Meanwhile, also a system wherein, taking notice of the characteristic that an address when superscribed is in most cases accompanied by an individual name, an organization name, a building name or the like, address data including an urban or rural prefecture to a room number, a building name, an organization name and an individual name are stored in advance and they are compared with results of character recognition to raise the reading accuracy has been proposed (this is hereinafter referred to as "prior art 2"). The prior art 2 is disclosed, for example, in a document (3), "A Knowledge Processing Based on Mutual Checking among Words for Hand-Written String Reading", Lecture Thesis Collection 2 of the 53rd National Meeting of the Information Processing Society of Japan, pp. 283-284.
Further, for example, in Japanese Patent Laid-Open Application No. Heisei 8-243503, as a main reading apparatus which augments the reliability in recognition of a block number, a sub-block number, a building number and so forth and reduces works by manual operation for correction of characters which cannot be read, a construction wherein detailed addresses including block numbers, sub-block numbers, building numbers, room numbers and so forth of addressees are stored as a dictionary and a detailed zip code corresponding to a district area of an address and an individual name of an addressee are read to produce sorting information including the detailed address of the addressee, and then the detailed address is compared with the detailed dictionary to read a block number, a sub-block number, a building number and a room number is proposed. Furthermore, in Japanese Patent Laid-Open Application No. Heisei 8-243505, a construction of an address reading apparatus and method wherein representation patterns of habitation representation numbers are stored as dictionary words for which wild cards representing arbitrary numbers are used and a candidate character group of results of recognition and costs of words are calculated to effect comparison so that recognition of a habitation representation number can be performed rapidly and with a high degree of accuracy. The apparatus disclosed in the documents mentioned above can be regarded as techniques similar to the prior art 2 described above in that reading results of a district name and an individual name and habitation data stored in advance are compared with each other to presume a block number, a sub-block number and so forth.
However, the prior arts described above have the following problems.
First, the prior art 1 has a problem in that, where it is used by itself, there is a limitation in correction of or complement to a character recognition error or incomplete character recognition (disabled recognition). The reason is such as follows.
Indeed, where the fact that a place name has such a hierarchical relationship of an urban or rural prefecture, a municipal district name, a street name, and a square is utilized, even if place name words of an upper hierarchy cannot be read, the place name can be presumed from a lower hierarchy word or words.
However, it is impossible to uniquely presume a lower hierarchy from an upper hierarchy or hierarchies. Further, where a wide area which includes a plurality of municipal districts is determined as a reading object, words of a same notation sometimes have different upper hierarchy words, and an upper hierarchy word cannot be presumed uniquely only from a lower hierarchy word or words. Furthermore, if incomplete character recognition occurs with a square or a room number, then it is very difficult to presume the square or room number.
On the other hand, the prior art 2 improves the problems of the prior art 1 significantly by comparing an address with habitation data including redundant individual names, building names and organization names. For example, it is possible to presume, from an individual name, an upper hierarchy word or a square of a place name, which is not recognized fully.
However, the prior art 2 is based on a conception that a correct object is estimated by searching for habitation data most similar to results of character recognition or word candidates extracted from such character recognition results.
Therefore, if the prior art 2 is applied by itself, then the problem that, where a correct answer is not included in the habitation data, even if answers outputted as a result of character recognition are all correct, not a correct reading result, but a similar but wrong address is outputted arises.
Naturally, even this problem can be eliminated if the habitation data are complete. However, actually it is very difficult to prepare habitation data of an area of an object of reading correctly without exception.