This invention relates to an optical character recognition system and, more particularly, to a character segmenting apparatus for segmenting characters, one by one, from an input pattern delivered from an optical scanner.
The character segmenting apparatus assumes a circumscribing square for each character, in a two-dimensional pattern, which is binary-coded and delivered from the optical scanner. The character is segmented on the assumption that the entire perimeter of its circumscribing square is white or blank by at least one mesh (one bit). The position and size information of the character is thus extracted for the entire scanning field. In the case of an article of mail, for example, the address and postal code number are primarily described on the mail article. However, the heights of characters range from about one milimeter to about ten milimeters and also the pitches between them are non-uniform. Character segmenting is extremely effective in such a case.
In a character reading operation, in general, usually only several specific lines are to be read among a large number of character lines. Character segmentation is specifically effective especially when the positions of characters and lines are non-uniform or when the number of characters to be read varies among the objects to be processed.
The prior art character segmenting methods can be broadly classified into a "compressing method" and a "masking method". In accordance with the compressing method, full character patterns are compressed in the transverse direction while being written into a memory in order to segment each line from the full patterns containing a plurality of lines. Then, compression is performed in the longitudinal direction, while the patterns of one stored line are read-out of memory. By using such compression in the two directions, each character can be segmented individually. Especially when the pitch of all characters is constant and the character assignment in the line is uniform throughout all lines, the sequence of compression may be reversed.
However, this prior art compressing method has the following drawbacks. First, a processing time of at least twice the scanning time is necessary for character segmenting. If the number of lines to be read by one scanning is great, the time for segmenting increases in proportion to the number of lines. Second, segmentation of the character lines from data compressed about the transverse direction becomes impossible, if the assignment of character lines are complicated, i.e., the spaces between the adjacent lines are relatively small and the character lines are skewed as a whole. In such a case, the compression processing must be carried out separately in the transverse direction and hence, the processing time becomes longer as much.
Another prior art method, i.e., "masking method", will be described next. The prior art masking method can be further classified into two methods. The first masking method directly detects that the entire circumference outside the aforementioned circumscribing square is white. A two-dimensional shift register is prepared which has a capacity that is greater than at least the maximal width of the characters to be processed. Then, the pattern signal delivered from an optical scanner is applied to this shift register. A white frame detection mask detects a condition wherein the entire circumference outside the circumscribing square of the character is white. The white frame detection mask is provided in the shift register in order to detect the presence or absence of a white frame outside the character. The detection is made in response to the character data stored in the shift register. If the pattern corresponding to the white frame is all indicative of a blank space, the character segmentation is detected. The second masking method uses a part of the upper left and a part of the lower right of the white frame to partially segment characters and thereafter synthesizes them to segment each character perfectly.
However, the first and second masking methods have the following problems. In the first method, the size of the character that can be detected is limited by the one mask of a given size. Hence, a plurality of masks, having varying sizes, must be prepared in order to segment characters having varying sizes, thus making the mask circuit construction complicated. Further, the address or postal code numbers on the mail article frequently have a low print quality. For example, there may be a partial blurring of the characters and consecutive printing or over printing of two or more characters. It is practically impossible to prepare masks to cover all of these cases. To solve these problems, this method must be employed jointly with other auxiliary methods such as the partial compressing method, resulting in an increase in the processing time.
The second method involves the following problems. Since this method is developed in order to solve the problems with the first masking method, it can cope with the variation of the character sizes and the blur and consecution or overstrike of the characters. However, because the detected data from each mask is produced in disorder with respect to time, a complicated algorithm is required in order to later segment each character from these data, by a calculation. Hence, a prolonged processing time is necessary.