The present invention relates to optical character reader systems. More particularly, it is concerned with a sectioning apparatus for sectioning a character string image given on a paper into individual characters.
In order to recognize a series of printed characters, the optical character reader systems are required to separate the characters one by one. Further, it is desirable for the optical character reader systems to be able to manage printed characters of many font types and poor print quality which are printed on general mails and documents. In a character string printed on general documents, there is a case where the characters to be separated touch each other, or one character is separated into more than one image due to the poor print quality. Further, in the case of the alphabet, each character width varies according to a variance in the font or character category. Therefore, a sectioning apparatus is necessary, which can properly section the character string into individual characters under the above-mentioned conditions.
This type of sectioning apparatus has been proposed in U.S. Pat. No. 3,629,826. The proposed sectioning apparatus scans a character image vertically and sets a character segmentation position by detecting a position where a vertical scan bit is minimized. In order to segment the touching characters, this apparatus stores many touching images for every character in advance, and a segmentation position is determined by comparing the touching character image with the stored touching images. That is, this apparatus performs the character segmentation referring to the local images of the character string.
However, such sectioning apparatus cannot properly segment the character split into two or more images as one character. Additionally, this apparatus requires a vast memory capacity for storing many touching images of every character touching other characters. Furthermore, a special function for character segmentation in accordance with an individual case may invite a considerable deterioration in precision and speed of the character segmentation.
This type of sectioning apparatus can use a character pitch as information for segmenting a character string. The character pitch can be given to the sectioning apparatus as known information by limiting printed matter to be read by an optical character reader. However, since characters printed or written on a general document have an unspecified character pitch as described above, the character pitch cannot be known beforehand. Accordingly, the character pitch must be estimated from a character string image on the paper.
Heretofore, a mean value of character widths of various characters is used as an estimated value for the character pitch. However, in the case where an individual character width largely varies according to a font or a character category, or touching characters increase in number, an error between the mean value of character widths and an actual character pitch is no longer negligible. Due to the error, the sectioning apparatus mistakes the number of touching characters or cuts off the character string at an incorrect position for segmentation. Further, as described above, it is desirable for the optical character reader to manage both the characters printed at constant pitch and printed at variable pitch including hand-written characters. In such a case, the sectioning apparatus must change the algorithm for character segmentation according to the constant pitch-printed characters and the variable pitch-printed characters. Therefore, it is also important to identify whether the pitch of the obtained character data is constant or variable before performing segmentation of character string.