1. Field of the Invention
This invention relates to an apparatus and a method for digitizing paper documents.
2. Description of the Related Art
In recent years, spread of networks as typified by the Internet has increased the opportunity to digitally distribute documents, but documents in a printed paper form are still often distributed. In this circumstance, techniques have been contemplated for, even when there is only a paper document at hand, obtaining the contents of the document as reusable data for a long time.
For example, there is a technique of reading a paper document by a scanner or the like transmitting the resulting image data from a terminal to a server, and causing the server to recognize and convert the data into a reusable form, and then to return the resulting data to the terminal (see Japanese Patent Laid-Open No. H11-167532 (1999)).
In another technique, the image data of the document is divided into regions according to a type, such that the data can be output individually on a region basis (see Japanese Patent Laid-Open No. 2005-346137).
When a document image (image data) generated by scanning a paper document is subjected to a document digitization process, a user desires different data formats according to his purpose or use. At all events, many users desire that a document is digitized into a convenient format.
For example, when a document includes a table, a user who wants to perform on the table an edit operation involving insertion/deletion of rows/columns desires to digitize the document such that the electronic document includes an editable table as a table object. On the other hand, a user who wants to re-print the document without a change for use as a paper document desires to digitize the document such that visual information on a table line layout and the like is reproduced as faithfully as possible.
However, in some format specifications of electronic documents, when a table in the document image is converted into a table object, a table structure or a table line layout may not be reproduced as it is.
Also, in the method of rendering the table as a vector object in order to reproduce the visual information on the table, the editing operation involving insertion/deletion of rows/columns is disabled.
When calculations and the like are performed using values in cells in the table with reference to the table structure, the table is preferably reproduced in a spreadsheet application format (format that expresses the table structure based on a cell matrix structure). However, in the format that expresses the table structure based on the cell matrix structure, when the same page (sheet) contains a plurality of tables, an editing operation performed on one of the tables may unintentionally affect another table.
FIG. 13A shows the state of two tables, a table 1311 and a table 1312, in total existing on the same sheet. Since a boundary between the cells falls on a line position in the spreadsheet application or the like, merging of cells and the like are performed in order to reproduce the two tables. FIG. 13B shows the result of the operation of adding a new column on the right side of a cell 1313 performed on a sheet edit window 1313 in the above condition. It is seen that the new column 1321 is inserted on the right side of the cell 1313 in the table 1311, but simultaneously an unintentional column 1322 is inserted in the table 1312 as well. In the other operations of deleting a column, changing the cell width and the like, if the edit operation is performed on one table, the other table is unintentionally affected.
Until now, it is difficult to achieve digitization of paper documents which can simultaneously fulfill various requests of the user without any problems as described above.