The present invention relates to a character reading method, in particular, to a character reading method effective for reading a string of hand-written characters in which a plurality of characters are close to one another. (The term "string of characters" will also be described as "series of characters" or "character series", hereinafter and in the drawings.)
A string of free-format hand-written characters may comprise string of characters that are close to one another. For example, the adjacent characters may come in contact with one another. Such a condition may cause problems in reading the characters. The problem is to enable appropriate splitting of the string of characters into individual characters.
The following methods are known for handling such a condition.
(A) A projection histogram is used. This projection histogram is obtained by projecting, in a direction perpendicular to the string, the string of characters on a line parallel to the string. Then, a minimum point in the histogram is determined to be a candidate of a position where the characters are adjacent to one another. (This method (A) disclosed in a second literature described below.)
(B) A discriminant analysis method is used for determining a straight line which splits the string of characters where the characters are adjacent to each other so as to split the string into the individual characters appropriately. The discriminant analysis method is performed on the two-dimensional distribution of pixels constituting the string of characters. (This method (B) is disclosed in a first literature described below.)
(C) As a result of tracing the contour of the string of characters, concavities are detected on the contour along both sides of the string, the contour extending along the direction of extension of the string. Then, a line is made by connecting a bottom of the detected concavity along one side of the string and another bottom of the detected concavity along the other side of the string. The connected bottoms are aligned with each other. The formed line is determined to be a line for appropriately splitting the string into the individual characters. (This method (C) is disclosed in the second literature described below.)
(D) In order to determine a line for appropriately splitting the string of characters into the individual characters, the following method can be executed. Starting from a bottom point of a detected concavity such as mentioned in the above method (D), pixels are traced successively so as to form the line for splitting the string, as follows. Either if the current pixel is a black pixel or if pixels on both sides of the current pixel are either both black pixels or both white pixels, then the current pixel is moved, by one pixel, to a bottom side pixel. In another case, that is, if a pixel of one side is a black pixel and a pixel of the other side is a white pixel, then the current pixel is moved laterally by one pixel toward the black pixel. (This method (D) is disclosed in the second literature described below.)
(E) Two contours of and along the direction of extension of the string of the characters, extending on both sides of the string, are respectively traced. In this tracing, the tracing points on both contours are aligned with one another in the direction perpendicular to the direction of extension of the string of characters. Then, in the tracing, if the distance between the tracing points respectively corresponding to both contours varies sharply, then the current points are connected to one another so as to form a candidate for a line for splitting the string appropriately. (This method (E) is disclosed in a third literature described below.)
(F) If the string comprises two characters respectively comprising loops, for example, the characters "0" and/or "9", the following method is used to split the characters. In contour tracing such as mentioned above, the contour extending in one side of the extension of the string may have two adjacent protrusions respectively corresponding to the two loops of the adjacent characters. Simultaneously, the contour extending in the other side of the extension of the string may have two adjacent protrusions respectively corresponding to the two loops of the above adjacent characters. Further, both the contours respectively may have concavities between the respective adjacent protrusions. The two concavities are then connected with one another so as to form a line for splitting the adjacent characters. (This method (F) is disclosed in the third literature described below.)
The above mentioned three literatures will now be described.
The first literature: By F. Kimura and M. Shridhar, "Recognition of connected numeral strings", in Proceedings of First International Conference on Document Analysis and Recognition, Saint Malo, France, Sep. 30-Oct. 1, 1991, pp. 731-739.
The second literature: By R. Fenrich, "Segmentation of automatically located handwritten words", in Proceedings of Second International Workshop on Frontiers in Handwriting recognition, Bonas, France, Sep. 23-27, 1991, pp. 33-44.
The third literature: By Fujisawa and Michino, "An Augmented Segmentation Algorithm for Connected Handwritten Numerals", in Electrical Communication Society General National Meeting, 1984, No.1588.
A particular problem in reading a string of handwritten characters is as follows. In an example of FIG. 16, where four numerals are written, a portion of the first numeral "5" overhangs or overlaps over a portion of the second numeral "7". This is because the two characters are close to one another. Such a condition results in the above particular problem. The problem is that it is difficult to form a line appropriate for splitting the string so as to separate the numerals into the respective characters.
The above-mentioned methods of the prior art merely utilize statistical properties or utilize information associated with pixels located at an edge portion of the corresponding image, such as utilizing a projection (A), a pixel distribution (B), and/or contours (C), (E) and (F). However, these methods of the prior art do not utilize information required for detecting overhanging or overlapping areas (of adjacent figures) such as mentioned above. In these methods of the prior art, other various complicated measures can be taken to detect such an overhanging or overlapping construction. In the method (D), not only information associated with pixels located at an edge portion of the corresponding image but also information of internal pixels can be utilized. However, this method depends merely on heuristics. Thus, reliability or applicability is not evident.
Further, as suggested in the method disclosed in the above second literature, the above methods (A)-(F) may be utilized in appropriate combination thereof according to the manners in which two adjacent characters are connected with one another. However, in such a combining method, several method have to be executed in parallel. Thus, a quantity of processing and a capacity of the programs for achieving the method is increased. Further, another problem may occur such as that several kinds of solutions appear for appropriately splitting the string of characters. In this case, it may be difficult to select an optimum solution from among the obtained several solutions.
Furthermore, these methods of the prior art basically assume connection between only two characters. Thus, it is necessary to apply a similar method recursively for dealing with connection among more than two characters.