1. Field of the Invention
The present invention relates to a method of reading characters, and more particularly to a method of reading character strings, particularly hand-written character strings including Kanji characters of postal addresses written on the surface of mail pieces.
2. Description of the Prior Art
For the automatic reading of a character string of postal address written on the surface of a mail piece or the like, the image of the mail surface is first converted into an electrical signal, and then the region where the character string is written is detected. Subsequently, based on the video signal of the detected region, characters of the character string are classified. Each character of the character string is classified by the following procedure.
(1) Image patterns which deem to be characters of a character string are extracted by segmentation:(character segmentation).
(2) Character species (character codes) of the segmented character patterns are classified:(character classification).
(3) A character string formed by connecting the classified character species is compared with character strings of postal addresses or the like registered in a table (character string dictionary) thereby to recognize the character string as a certain address or the like: (character string matching).
Among the above-mentioned processes, the character segmentation of item (1) is most difficult due to a variety of cases of written surfaces including hand-written characters, characters of Kanji in which one character can be made up of multiple other characters, and character strings written in either a vertical or horizontal form, as will be explained later in connection with FIG. 1 and FIG. 34A.
In regard to the conventional scheme of character segmentation for a character string read out of a written surface, the over segmentation approach is known to be effective. In the over segmentation approach, the image signal of a character string is separated into multiple character patterns having the possibility as characters, each separated character pattern is classified in terms of character (character species), and the character patterns are determined to be correct based on the similarity of the classified character species of character pattern and the comparison of the string of character species with character strings in a reference dictionary.
As a specific example of the prior art regarding the over segmentation approach, there has been proposed the scheme of the testing of recognition-candidate characters based on character classification by Fujimawa, et al. (described in The Proceeding of The 1984 Institute IEIC Fall Conference "An Augmented Segmentation Algorithm for Connected Handwritten Numerals").
Another scheme of the testing of recognition-candidate character patterns based on the shape of characters has been proposed by Ishidera, et al. (described in The Proceeding of The 1995 Institute IEIC Spring Conference D-576 "A Segmentation Method of Address Recognition").
Schemes of the testing of the assumption based on character classification and character string comparison have been proposed by Murase, et al. (described in The Transaction of the Institute of Electronics, Information and Communication Engineers, (D) Vol.J69-D, No.9 "Segmentation and Recognition of Hand-written Character String Using Linguistic Information"), and by ooi (described in the TECHNICAL REPORT OF IECE PRU 92-40 "A Method to Recognize the Street Number Portion of an Address").
A scheme of the assessment of correctness of character segmentation based on the character width, character pitch and character spacing is described in The Transaction of the Institute of Electronics, Information and Communication Engineers, REPORT OF IECE (D) J68-D, No.12, pp.2123-2131. Also known is a scheme of the assessment of correctness of character segmentation based on the character pattern and information on the similarity of character species as described in The Transaction of the Institute of Electronics, Information and Communication Engineers, REPORT OF IECE (D) J68-D, No.4, pp.765-772.
However, the above-mentioned prior art schemes of over segmentation approach encounter the difficulty of correct character segmentation, as will be shown for some examples in the following.
In FIG. 1 showing a postal address 101 hand-written on a mail piece, a street number portion 102 is visually recognized to be Kanji-numerals "{character pullout}-{character pullout}-{character pullout}". In this case, a character reading apparatus based on the above-mentioned over segmentation approach implements the character pattern segmentation for the region 102 at boundaries shown by the dashed lines. Namely, the vertical and horizontal lengths and vertical/horizontal length ratio of character patterns vary significantly depending on individual character species, and therefore it is difficult to select a correct character string out of six possible cases 103.
FIG. 33A shows a hand-written character string with large character spacings. This character string is segmented at boundaries shown by the dashed lines, resulting in recognition-candidate character patterns as shown in FIG. 34A. In the figure, the relationship of the candidate patterns is expressed graphically in terms of nodes that represent boundaries of character patterns and arcs that represent character patterns, and it is called a "segmentation hypothesis network".
Correct segmentation of character patterns based on the above-mentioned over segmentation approach is carried out by the process of finding the optimal path from the starting node 0 to the ending node 9 on the segmentation hypothesis network. The character patterns represented by the arcs in FIG. 34A are classified in terms of their character species. In this case any of "{character pullout}", "{character pullout}", and "{character pullout}" indicates a high similarity, and therefore it is difficult for the prior art schemes to segment the character string.
Among the above-mentioned prior art schemes, the one proposed by Fujisawa, et al. and the one proposed by Ishidera, et al. is designed to judge the legitimacy of each character pattern, but it does not use the relation with neighboring character patterns, and the ones proposed by Ooi and Murase use the relation with neighboring character patterns for the matching of character strings, but these schemes do not use information of the relative feature values of neighboring characters such as the spacings.