In a conventional method of recognizing an address from the surface of mail, the following steps are performed:
(1) Images of the mail are photoelectric transformed and input as a digital image to the address recognizer,
(2) Address area candidates are extracted from the digital image of the mail, wherein each address area candidate includes a plurality of character line and address line candidates, and
(3) The characters included in the address area candidates are read and interpreted as a character string.
A technique for accomplishing step (2) as described above is disclosed in "A NEW METHOD OF DOCUMENT STRUCTURE EXTRACTION USING GENERIC LAYOUT KNOWLEDGE" by H. Yashiro, et al, Proc. of International Workshop on Industrial Applications of Machine Intelligence and Vision (MIV-89), IEEE, Apr. 10, 1989. This reference illustrates that if the image area includes a character line, then an area including the image area is extracted as an address area candidate.
Another technique for accomplishing step (2) is disclosed in "DOCUMENT IMAGE SEGMENTATION METHOD BASED ON PROJECTION PROFILES AND STATE DENSITIES", by T. Akiyama, et al. This reference describes that first, an area in which an address may be described is extracted and second, a character line in the area is extracted. When extracting the address area candidates several address areas are picked up.
Yet another technique is described in "ANALYSIS OF ADDRESS LAYOUT ON JAPANESE HANDWRITTEN MAIL" by N. Nakajima Proc. Of Icpr '96, IEEE, 1996. This reference describes a method of using layout information such as the arrangement of an address area candidate in the image, a shape of an address area candidate, an arrangement of a character line in the address area candidate and a shape of the image of the mail.
Generally, even if an address area is selected on the basis of layout information, the selected result depends on a character direction, and a processing result. It is difficult to correctly distinguish a character direction using only layout information. For example, in address area 300 illustrated in FIG. 3A. a character line "Flower, AZ 11111" is shown as part of an address.
In some situations, based on the position or a shape of the character line in the address area 300, the relationship between shape of the mail and a character line direction in the address area cannot be determined. A character line direction is a direction in which successive characters of a line of characters are disposed. Usually a character is written on the right side of a preceding character in a line of characters. In the case of FIG. 3A, the character line direction is from left to right.
Conventional equipment for address recognition from a rectangular shaped mail item cannot determine the correct character line direction in the following situations:
(1) When the shape of the mail is oblong, a character line direction in the address area candidate is lateral, writing from the left to the right. The upper direction of FIG. 3A is the upper portion of a character.
(2) When the shape of the mail is oblong, a character line direction in the address area candidate is lateral, writing with an opposite direction from the top and bottom of a character. The character line is lateral writing from the right to the left.
(3) When the direction of a character line and the direction of a character are rightward. In FIG. 3A, the character line is vertical, writing from the left to the right.
(4) When the shape of the mail is oblong, in the above case (3), the direction of a character line and a direction under the character conforms in the rightward direction. The character line is vertical, writing from the right to the left.
Therefore, according to the above a correct address area cannot be chosen from several address area candidates by using information of a layout since layout information does not address these situations.
Japanese patent Laid-open print No. 8-224550 discloses an apparatus which performs processing of addresses by obtaining address area information. In this reference, the first step analyzes arrangement of a character line candidate in a address area extracted as a candidate, and the second step selects the head line of the address area candidate. The third step recognizes a pattern in the head line and recognizes whether the recognized pattern is a valid address. The fourth and final step selects the address area including the address head line as a correct address area if it was determined to be valid.
When applying the above described conventional technique to equipment for address recognition such as a mail sorting machine, there is the following problems. At the time of address reading of mail, time is needed to perform a receiver's address character recognition exactly in the address area candidates by extracting a particular area. Thus, a mail sorting machine that handles a large quantity of mail cannot use the above described conventional technique when the machine needs the speed and accuracy since it would be very time consuming. Moreover, noise such as an illustration or mark, that is ambiguous relative to a character in the neighborhood of an address described in mail, and a character that is ambiguous relative to a zip code and addresses such as an advertisement can cause problems. Further, it is difficult to determine a character direction in an address area candidate based on information about a layout since the direction of the characters come from the characters themselves and information of a character string containing the character.