The present invention relates to an address reader method and apparatus for recognizing addresses or names on the surfaces of mail (letters, packages, postcards, periodicals, etc.). More particularly, the present invention relates to an address reader method and apparatus for recognizing addresses and names on the surfaces of mail by determining whether an address area is most likely to be an address area that contains an address or name that corresponds to particular address or names such as a receiver of the mail.
In a conventional method of recognizing an address from the surface of mail, the following steps are performed:
(1) Images of the mail are photoelectric transformed and input as a digital image to the address recognizer,
(2) Address area candidates are extracted from the digital image of the mail, wherein each address area candidate includes a plurality of character line and address line candidates, and
(3) The characters included in the address area candidates are read and interpreted as a character string.
A technique for accomplishing step (2) as described above is disclosed in xe2x80x9cA NEW METHOD OF DOCUMENT STRUCTURE EXTRACTION USING GENERIC LAYOUT KNOWLEDGExe2x80x9d by H. Yashiro, et al, Proc. of International Workshop on Industrial Applications of Machine Intelligence and Vision (MIV-89), IEEE, Apr. 10, 1989. This reference illustrates that if the image area includes a character line, then an area including the image area is extracted as an address area candidate.
Another technique for accomplishing step (2) is disclosed in xe2x80x9cDOCUMENT IMAGE SEGMENTATION METHOD BASED ON PROJECTION PROFILES AND STATE DENSITIESxe2x80x9d, by T. Akiyama, et al. This reference describes that first, an area in which an address may be described is extracted and second, a character line in the area is extracted. When extracting the address area candidates several address areas are picked up.
Yet another technique is described in xe2x80x9cANALYSIS OF ADDRESS LAYOUT ON JAPANESE HANDWRITTEN MAILxe2x80x9d by N. Nakajima Proc. Of Icpr ""96, IEEE, 1996. This reference describes a method of using layout information such as the arrangement of an address area candidate in the image, a shape of an address area candidate, an arrangement of a character line in the address area candidate and a shape of the image of the mail.
Generally, even if an address area is selected on the basis of layout information, the selected result depends on a character direction, and a processing result. It is difficult to correctly distinguish a character direction using only layout information. For example, in address area 300 illustrated in FIG. 3A. a character line xe2x80x9cFlower, AZ. 11111xe2x80x9d is shown as part of an address.
In some situations, based on the position or a shape of the character line in the address area 300, the relationship between shape of the mail and a character line direction in the address area cannot be determined. A character line direction is a direction in which successive characters of a line of characters are disposed. Usually a character is written on the right side of a preceding character in a line of characters. In the case of FIG. 3A, the character line direction is from left to right.
Conventional equipment for address recognition from a rectangular shaped mail item cannot determine the correct character line direction in the following situations:
(1) When the shape of the mail is oblong, a character line direction in the address area candidate is lateral, writing from the left to the right. The upper direction of FIG. 3A is the upper portion of a character.
(2) When the shape of the mail is oblong, a character line direction in the address area candidate is lateral, writing with an opposite direction from the top and bottom of a character. The character line is lateral writing from the right to the left.
(3) When the direction of a character line and the direction of a character are rightward. In FIG. 3A, the character line is vertical, writing from the left to the right.
(4) When the shape of the mail is oblong, in the above case (3), the direction of a character line and a direction under the character conforms in the rightward direction. The character line is vertical, writing from the right to the left.
Therefore, according to the above a correct address area cannot be chosen from several address area candidates by using information of a layout since layout information does not address these situations.
Japanese patent Laid-open print No. 8-224550 discloses an apparatus which performs processing of addresses by obtaining address area information. In this reference, the first step analyzes arrangement of a character line candidate in a address area extracted as a candidate, and the second step selects the head line of the address area candidate. The third step recognizes a pattern in the head line and recognizes whether the recognized pattern is a valid address. The fourth and final step selects the address area including the address head line as a correct address area if it was determined to be valid.
When applying the above described conventional technique to equipment for address recognition such as a mail sorting machine, there is the following problems. At the time of address reading of mail, time is needed to perform a receiver""s address character recognition exactly in the address area candidates by extracting a particular area. Thus, a mail sorting machine that handles a large quantity of mail cannot use the above described conventional technique when the machine needs the speed and accuracy since it would be very time consuming. Moreover, noise such as an illustration or mark, that is ambiguous relative to a character in the neighborhood of an address described in mail, and a character that is ambiguous relative to a zip code and addresses such as an advertisement can cause problems. Further, it is difficult to determine a character direction in an address area candidate based on information about a layout since the direction of the characters come from the characters themselves and information of a character string containing the character.
The present invention provides an address reader method and apparatus for selecting a particular address area as most likely being an address area containing, for example, a receiver address from several address area candidates obtained from the surface of mail and recognizing character strings contained in the selected address area. Mail includes letters, postcards, packages, periodicals, etc. An address could, for example, include an addressee and a destination address. The addressee could, for example, be the same of a person, corporation, division, department, etc. The destination address could, for example, include a suite, apartment, or floor number, etc., city, state and zip code, country.
The present invention provides an address reader method and apparatus for recognizing characters contained in a particular address area selected from a plurality of address area candidates as an address area most likely to contain a receiver address. The present invention accomplishes the above by inputting an image of a surface of the mail and segmenting the image into at least one character string candidate. Thereafter, at least one address area candidate is extracted based on the character string candidate and one of the at least one address area candidate is selected as a receiver address area of the mail. The selection is performed by analyzing each of the at least one address area candidate based on predetermined positional information, information of a character direction appropriate for the predetermined positional information, and key character string information. The receiver address contained in the selected address area candidate is then recognized by analyzing character strings included therein.
The address reader method and apparatus of the present invention can, for example, form part of a mail sorting machine which includes a scanner and a sorter. The scanner scans the surface of the mail and inputs an image of the surface to the address reader method and apparatus of the present invention. The sorter receives the recognized receiver address output by the address reader method and apparatus of the present invention and sorts the mail based on the recognized receiver address.
The selection of an address area candidate as most likely containing, for example, a receiver address is performed by comparing character strings in the address area candidate to key character strings. The key character string are strings of characters that would mostly exist in an address area candidate for the address area candidate to be considered, for example, a part of a receiver address. If the key character string exists in the character strings of the address area candidate and the key character string is in the appropriate position and character direction as per the positional information and character direction, then the address area candidate is selected as an address area candidate most likely containing a receiver address.