This invention relates to a character recognition system, and more particularly to a character recognition system by which a character area can be extracted efficiently.
It is a peculiar problem, for example, to Japanese characters including "kanji" and "kana" characters to discriminate discrete characters exactly in order to recognize individual characters out of horizontally or vertically written character lines.
An exemplary approach to the problem is disclosed, for example, in U.S. Pat. No. 4,850,025, wherein rectangular areas are imaginarily formed from projection data obtained from a horizontal or vertical character line such that each of them may circumscribe a single entire character or a component of a character, and adjacent ones of such rectangular areas are integrated until the height-to-width ratio of each of the thus integrated areas becomes substantially equal to 1, thereby extracting individual discrete characters.
In this connection, most of the Japanese characters in the form of em characters present, when they are each circumscribed by a rectangular frame, a height-to-width ratio substantially equal to 1. In the case of discrete characters such as " " and " " where character components constituting a single character are horizontally disconnected and discrete from each other, it is considered that, if two or more adjacent character components are integrated such that a rectangular area which circumscribes the character components may have a height-to-width ratio substantially equal to 1, then they can be extracted as a single discrete character.
Practically, however, if the height-to-width ratio of such rectangular area is strictly examined for discrete characters obtained by the integration, the height-to-width ratios may not be uniform with all of the discrete characters. Therefore, the method described above is not satisfactory for practical use in accuracy of extraction of discrete characters.