1. Field of the Invention
The present invention relates generally to document image processing using digital computers and, more particularly, to optical character recognition which also recognizes, captures and stores tabular data.
2. Description of the Prior Art
Automatic processing of digital data representing an image of a printed document using a digital computer to recognize, capture and/or store information has, for many years, been a subject of active research and commercial products. Thus far, however, such image processing has focused on recognizing, capturing and/or storing texts and even formats present in printed documents. However, in addition to text, many printed documents, particularly financial, scientific and technical documents, contain tabular data. Truly recognizing, capturing and/or storing the entire informational content of such documents necessarily requires capturing more than the format of such tabular data. Rather, truly recognizing, capturing and/or storing such a document's entire informational content requires automatically capturing tabular data in a format suitable for easy computer-based analysis. At present, fully reconstructing the informational content of tables from printed documents in a format suitable for computer-based analysis requires manually re-entering the data from a printed table in a format suitable as input to a database or spreadsheet computer program.