1. Field of the Invention
The present invention relates to a table recognition method for extracting ruled lines in a table in a document image.
2. Description of the Related Art
Document-image recognition technology, such as an OCR (Optical Character Reader or Optical Character Recognition) technology, is available for digitizing tasks that have been operated on paper documents and converting documents distributed in paper form into electronic documents. Since a document may contain a table or tables, technology for table recognition is important. A table is generally expressed by a combination of vertical and horizontal ruled lines. The table recognition is performed by extracting layout information of table ruled lines from a document image and analyzing the table structure based on the extracted ruled-line layout information. Thus, technology for extracting ruled lines is required for accurately recognizing a table.
One example of a method for extracting table ruled lines is a method for detecting ruled lines from continuous pixels in a document image. The method for detecting ruled lines from continuous pixels has a high accuracy in detection of solid lines, but cannot detect line segments other than solid lines. Another method is to detect ruled lines by using a technique for extracting edges in an image. When the technique for extracting edges is used to detect ruled lines, two ruled-line candidates are generated from a solid line and thus need to be integrated together in subsequent processing. This method has a low accuracy compared to the method for detecting ruled lines from continuous pixels. When ruled lines are detected by the two methods and the results obtained thereby are then integrated together, subsequent processing is required as well. As described above, with only a combination of the method for detecting ruled lines from continuous pixels and the method for detecting ruled lines by using the edge-extraction technique, it is difficult to extract ruled lines from an image in which multiple types of ruled lines coexist.
Border ruled lines formed by a texture cannot be detected by the method for detecting ruled lines from continuous pixels. On the other hand, when border ruled lines formed by a texture are detected by the ruled-line detection method using the edge-extraction technique, the amount of false extraction of non ruled lines, such as characters in an image, increases.
Related technologies are disclosed by Japanese Unexamined Patent Application Publication No. 10-40333 and Japanese Unexamined Patent Application Publication No. 01-217583.