The present invention generally relates to image extraction systems, and more particularly to an image extraction system for extracting characters, graphics and the like which touch a character frame, a rule and the like in a hand-written character recognition apparatus such as an optical character reader (OCR).
As input/output devices designed for hand-written characters, increase the demands for hand-written character recognition apparatuses are increasing. In order to realize a high recognition rate of the individual characters in such a hand-written character recognition apparatus, it is important that the process of extracting the character is carried out accurately prior to the recognition stage.
An example of a document which is the subject of the recognition, there are documents such as form sheets which specify the positions where the characters should be written. In such documents, a frame or the like, which specifies the position where the character should be written, is printed, not in a dropout color, but in the same color and density as the character. That is, a black frame, a black rule or the like, is printed on such a document. Accordingly, if the characters are clearly written within the specified ranges, it is possible to automatically recognize the characters at a relatively high recognition rate. However, if the hand-written character exceeds the specified range and touches or penetrates the frame or rule which indicates the specified range, the recognition rate greatly deteriorates.
Various methods have been proposed to extract only the character which touches the character frame. For example, Japanese Laid-Open Patent Application No.63-251874 proposes a method of extracting a touching character, and Japanese Laid-Open Patent Application No. 3-233787 proposes a method of extracting a character image.
FIG. 1 shows an example of a conventional character extraction system. In FIG. 1, the character extraction system includes a contact detection means 181 for detecting contact between the character frame and the character, a contact range determination means 183 for determining a contact range between the character and the character frame, and an interpolation means 184 for interpolating an overlapping portion of the character and the character frame by rectangles. Frame position data 182 related to the position and size of the character frame are supplied to the contact detection means 181.
When extracting the character, the frame position data 182 are stored in advance as form sheet data. The contact detection means 181 checks whether or not the black pixels of the input image make contact at the position of the character frame, based on the frame position data 182. In addition, the contact range determination means 183 determines that a region obtained by connecting points where the input image makes contact with the character frame is the character portion within the character frame. The interpolation means 184 regards that the region which is determined by the contact region determination means 183 is the character region, and fills this character region by the rectangles. The character is extracted in this manner.
According to the conventional system described above, it is a precondition that the position and line width of the character frame are known in advance. For this reason, the accuracy of the character extraction is easily affected by a slight skew or unevenness of the character frame. In other words, if a portion of the character frame projects from a predetermined position, for example, this projecting portion will be recognized as the character and this projecting portion will remain as noise. In addition, unwanted joining of the character and the character frame portion may occur and make the quality of the extracted character extremely poor. Further, the original character portion may drop out (that is, become chipped) due to a deviation in the position or line width of the character frame.
On the other hand, the method employed in the conventional system to judge the character region within the character frame does not consider the continuity or connection of the character line segment. As a result, the method simply fills the gap locally by the rectangular region, and there is considerable deterioration in the quality of the extracted character.
FIG. 2 shows an example of a character pattern extracted by the conventional system described above. In FIG. 2, the left half shows a contact portion between a character line segment 191 and a character frame 192, and the right half shows a portion of the character which is extracted from the pattern shown on the left half.
FIG. 2 shows the character line segment 191 which is extracted from the character frame 191 on the precondition that the character frame 192 has a width amounting to 2 pixels. In the character frame portion shown on the left half of FIG. 2, a portion 192a having a width amounting to 2 pixels is eliminated as the character frame portion when the character is extracted. However, a thinned portion 192b having a width narrower than the width amounting to 2 pixels due to grazing or the like will not be eliminated as the character frame portion. For this reason, in the character line segment which is extracted from the pattern shown on the left half of FIG. 2, the character frame portion where the width of the character frame is narrower than the width amounting to 2 pixels due to the grazing or the like remains as a portion of the character, and the character line segment having a poor quality is extracted as shown on the right half of FIG. 2.
FIGS. 3A, 3B and 3C respectively show examples of the character which is extracted by the conventional system described above. In FIGS. 3A through 3C, the left half shows the character which makes contact with the frame, and the right half shows the extracted character. As described above, the conventional system does not take into consideration the continuity and connection of the character line segment, the line width of the character, the size of the character and the like. For this reason, the quality of the extracted character is extremely poor. FIG. 3A shows a case where the frame is extracted as a portion of the character. FIG. 3B shows a case where the frame makes contact with 2 characters and the 2 characters which are connected via the frame are extracted as 1 character. FIG. 3C shows a case where a stain or spot on the character is extracted as a portion of the character.