1. Field of the Invention
The present invention relates to page segmentation systems for classifying regions of a document image. More particularly, the present invention relates to a block selection system for identifying and defining features within table images.
2. Incorporation by Reference
Commonly-assigned U.S. application Ser. No. 07/873,012, now U.S. Pat. No. 5,680,479, entitled "Method and Apparatus For Character Recognition", Ser. No. 08/171,720, now U.S. Pat. No. 5,583,072, entitled "Method and Apparatus For Selecting Text And/Or Non-Text Blocks In A Stored Document", Ser. No. 08/338,781, entitled "Page Analysis System" now U.S. Pat. No. 5,987,171, Ser. No. 08/514,252, entitled "Feature Extraction System", now U.S. Pat. No. 5,848,186, and Ser. No. 08/664,675, entitled "System For Extracting Attached Text", are herein incorporated as if set forth in full.
3. Description of the Related Art
Conventional page segmentation systems are applied to document images in order to identify data types contained within specific regions of the document images. This information can be used to extract data within a specific region and to determine a type of processing to be applied to the extracted data.
For a document containing a table image, a region of text, or table cell, located within the table image can be converted to ASCII characters using optical character recognition (OCR) processing and stored in an ASCII file along with information corresponding to the location of the table cell. However, conventional systems cannot accurately determine a row and column address corresponding to the table cell. Accordingly, the recognized ASCII characters cannot be reliably input to a spreadsheet based on row and column address data.
In addition, the data produced by conventional systems is often insufficient to adequately recreate the internal features of a bit-mapped table image. For example, the data does not reflect vertical and horizontal grid lines within an analyzed table image. As defined herein, vertical and horizontal grid lines define each row and column within a table, and can be either visible or non-visible. Therefore, although a conventional system can be used to create an ASCII version of a bitmapped table, the ASCII version does not include data representative of table grid lines. Accordingly, the stored data cannot be used to accurately recreate a bit-mapped version of grid lines within the table. Moreover, in a case that it is desired to edit text within a table cell, it is difficult to determine, based on information provided by conventional systems whether the edited text will intersect with a grid line and thereby violate row/column boundaries.
Consequently, what is needed is a system for accurately identifying and representing internal features of a bit-mapped table image, such as rows, columns, and table grid lines.