The invention in general relates to a technology capable of recognizing hand-written or printed ruled lines in a document such as an application form.
It is a common practice to process hand-written or printed forms (for example, application forms or mark-sheets) using a character recognition apparatus. Such a form may contain tables drawn with rule lines which may again be hand-written or printed. However, if the lines are faint, then it becomes difficult for the character recognition apparatus to decide whether it is a line or a character.
A technology for overcoming this drawback is disclosed in Japanese Patent Application Laid-Open No. 10-49676. What is disclosed is as follows. A histogram of a black run is calculated from a document image, and a threshold value of the black run as a parameter for extracting a ruled line is extracted based on the calculated histogram. Further, rectangles which are connected components of the black run not less than the extracted threshold value are extracted, and the proximity rectangles are combined so that continuous ruled lines are extracted. Then, a histogram of a length of the ruled line of the document image is calculated, and a threshold value for extracting the length of the ruled lines is extracted based on the calculated histogram. Thereafter, a continuous ruled line having a length not less than the threshold value for extracting the length of the extracted ruled line is recognized as a ruled line.
However, if black pixels which are continued are not more than threshold values, a special process for connecting ruled lines is required. In this special process, for example, candidate areas in which ruled lines are considered to exist are sorted out, and a judgment is made as to whether the candidate areas are really ruled lines. However, in this case, there occur errors such that a large-character portion which is written as a title, for example, is extracted as a ruled line, or a lot of character rectangles having accidentally long run exist in the positions of the ruled line candidates, the judgment is made that the ruled lines exist. Moreover, when a plurality of frames exist inside a frame, a position to be the ruled line candidate becomes enormous. For this reason, this case has disadvantages that the extraction fails and overflow occurs.
It is an object of the present invention to provide a method and apparatus for table recognition, an apparatus for character recognition, and a computer readable recording medium that stores a computer program which when executed realizes the method according to the present invention.
According to one aspect of this invention, there are provided a method and apparatus for table recognition having a configuration as follows. First, circumscribing rectangles of connected components of black pixels of the document image are extracted. Then, the extracted circumscribing rectangles are separated into character candidates and frame candidates according to information such as an aspect ratio, a number of black pixels and a number of black runs of the extracted circumscribing rectangles. Then, images within a range of the rectangles which have been separated as the character candidates are filled with white. Finally, frames are recognized from the rectangles which have been separated as the frame candidates.
According to another aspect of this invention, there are provided a method and apparatus for table recognition having a configuration as follows. First, circumscribing rectangles of connected components of black pixels of the document image are extracted. Then, the extracted circumscribing rectangles are separated into character candidates and frame candidates according to information such as an aspect ratio, a number of black pixels and a number of black runs of the extracted circumscribing rectangles. Then, images within a range of the rectangles which have been separated as character candidates are filled with white. The rectangles which have been separated as the frame candidates are rearranged in order of increasing area. Frames are successively extracted from the rectangles which have been separated as the frame candidates in the order of increasing area. Finally, the images of the rectangles which have been separated as the frame candidates from which the frames were extracted are repeatedly filled with white.
According to still another aspect of this invention, there are provided a method and apparatus for table recognition having a configuration as follows. First, circumscribing rectangles of connected components of black pixels of the document image are extracted. Then, the extracted circumscribing rectangles are separated into character candidates and frame candidates according to information such as an aspect ratio, a number of black pixels and a number of black runs of the extracted circumscribing rectangles. Then, images within a range of the rectangles which have been separated as character candidates are filled with white. The rectangles which have been separated as the frame candidates are rearranged in order of increasing area. Frames are successively extracted from the rectangles which have been separated as the frame candidates in the order of increasing area. When a number of the rectangles have been separated as the frame candidates is not more than two and they do not establish a positional relationship that they are partially overlapped with each other, then the images of the rectangles which have been separated as the frame candidates from which the frames were extracted are filled with white. On the other hand, when a number of the rectangles have been separated as the frame candidates is not more than two but they establish a positional relationship that they are partially overlapped with each other, then the image of the rectangles which have been separated as the frame candidates from which the frames were extracted are not filled with white, and a frame from the rectangle of next candidate is extracted.
The character recognition apparatus according to still another aspect of this invention incorporates the table recognition apparatus according to this invention.
The computer readable recording medium according to still another aspect of this invention stores a computer program which when executed realizes the method according to the present invention.
Other objects and features of this invention will become apparent from the following description with reference to the accompanying drawings.