The present invention relates to an apparatus for recognizing an address appearing on a mail article.
In a conventional address recognition apparatus, combinations of words contained in an address description are picked up and the combinations of words are successively compared with those in an address dictionary. When maximum coincidence is detected, the address is recognized as an address corresponding to one in the address dictionary. However, an address description generally contains a multiplicity of words for address information, such as country, state, prefecture, city, town, village, avenue, street, lot numbers, street numbers, company, building, etc. Therefore, in the conventional address recognition apparatus, the various words of the address information are extracted word-by-word. After this, many combinations of words which constitute an address are sequentially picked up and compared with address dictionaries stored in a dictionary memory, one by one, in order to recognize the address description. Therefore, a disadvantageously long time is required to execute address recognition processing, which causes the conventional apparatus of this type to have low processing capability.
The operation of such a conventional apparatus will be described below by way of an example involving the address shown in FIG. 1(a), which contains three lines having a string of words. In FIG. 1(a), the city name and postal code are shown in the bottom line, the street name of avenue and numerals in the middle line and the building name and floor destination in the top line. FIGS. 1(b) and 1(c) show extracted words corresponding to the address description shown in FIG. 1(a) and numbers indicating the extracted words. In address recognition processing, address constituent items such as city name, avenue and building name are recognized from the word train in each line shown in FIG. 1(c).
In the detection process, address constituent items are detected from bottom to top in the address description shown in FIG. 1(a), such that recognition is effected from larger address factor to smaller address factor. That is, the city name avenue, lot number and building name are sequentially detected.
Accordingly, the possible combinations of words, which are to be compared with address dictionaries, are as follows: three combinations in the first line, fifteen combinations in the second line, and twenty-eight combinations in the third line, as shown in FIG. 2. The following is a rough estimate of the number of comparing operations conventionally required to compare this address with address dictionaries which contain 500 registered country/city names, 2000 registered avenue names, and 500 registered building names.
In this example, the address dictionaries are classified (broken down) on the basis of the number of characters included in a given dictionary word (word length) as follows:
Classification of country/city names; PA0 Classification of avenue names; PA0 Classification of building names;
18 classes, from 3 characters to 20 characters PA1 25 classes, from 6 characters to 30 characters PA1 25 classes, from 6 characters to 30 characters
For the matching method, a dynamic programming (DP) matching method is adopted in which an input word including n characters (a word length of n) is compared with dictionary segments of three different classes, i.e., dictionary words of word lengths n-1, n and n+1, respectively. Accordingly, the number of comparing operations may be estimated, on average, as follows: ##EQU1##
Thus, in order to recognize the address shown in FIG. 1(a), the comparing operation must be executed a total of 9,550 times. Accordingly, assuming that the conventional apparatus needs 20 .mu.s to execute one comparing operation, about 190 ms is needed to execute the whole operation for recognizing one address, which corresponds to a processing rate of 19,000 letters per hour. In order to achieve a desired processing rate of 30,000 letters per hour, two of the conventional systems are disadvantageously necessary.
There is another type of a conventional apparatus for recognizing an address description appearing on a mail article, in which, in order to reduce the number of dictionary words to be compared, the number of characters contained in a word (word length) and one or two characters at the beginning of the word are noted and word dictionaries are grouped and linked in a memory on the basis of the characteristics of these factors. Thus, when a character string of an input word is compared with word dictionaries the word dictionaries to be compared are restricted in accordance with the word length of the input word and the first character(s) of the input word, and the dictionary words to be compared are sequentially read out in a predetermined linked manner. This technique is described in U.S. patent application Ser. No. 799,831 "Word Recognition Apparatus" filed on Nov. 20, 1985.
This address recognition apparatus suffers, however, from the following problem. Since the word dictionaries are grouped using the word length and the first character(s), the numbers number of registered addresses is more or less the same irrespective of the regional characteristic of the mail quantity distribution, which means that the time required for the address recognizing operation cannot be sufficiently reduced even when the regional characteristic of the mail quantity distribution is relatively marked.