In recent years, amid calls for environmental issues, move to paperless offices has been promoted, and various techniques that handle digital documents have been proposed.
For example, as a method of converting a paper document into a digital document, for example, Japanese Patent Application Laid-Open No. 2001-358863 describes a technique for scanning a paper document by a scanner, converting the scanned data into a digital document format (e.g., JPEG or the like), and transmitting the converted data. However, since the above technique described in Japanese Patent Application Laid-Open No. 2001-358863 aims at converting an image scanned by a scanner into a digital document such as JPEG or the like, it does not take any account of a search process of saved files using printed documents. Therefore, print and scan processes must be repeated, and a converted digital document image gradually deteriorates.
On the other hand, Japanese Patent Application Laid-Open No. 8-147445 discloses a technique for dividing document data into regions for respective properties, and saving all regions as raw image data (or compressed image data). However, since respective regions are handled as images, a large file size is required, and an image deteriorates if an edit process such as enlargement or the like is made.
On the other hand, a technique for searching for digital information corresponding to a paper document has been proposed. For example, Japanese Patent Application Laid-Open No. 10-063820 discloses a technique for identifying corresponding digital information on the basis of a scanned input image, and further describes an information processing apparatus which extracts a difference between the input image and digital information, and composites the extracted difference to the identified digital information. On the other hand, Japanese Patent Application Laid-Open No. 10-285378 discloses the following technique. That is, in a digital multi-function peripheral (MFP) (comprising a copy function, scan function, print function, and the like), it is confirmed if a scanned image includes a graphic code indicating a page ID, and if the graphic code is found, a database is searched for the corresponding page ID. If the page ID is found in the database, the currently scanned image is discarded, print data associated with that page ID is read out, and a print image is generated by a print operation and is printed on a paper sheet. On the other hand, if no corresponding page ID is found in the database, the scanned image is directly copied onto a paper sheet in a copy mode, or a PDL command is appended to the scanned image to convert the scanned image into a PDL format, and the converted data is transmitted in a facsimile or filing mode.
However, since the technique in Japanese Patent Application Laid-Open No. 10-063820 extracts difference information by searching for an original digital document corresponding to the output paper document, information additionally written on the paper document can be held as difference image. However, since the difference information directly handles a scanned image, a large storage capacity is required. In addition, if no original digital document corresponding to the output paper document is found, the process ends. In the technique in Japanese Patent Application Laid-Open No. 10-285378, if no original digital document corresponding to a paper document is found, a PDL command is appended to a scanned image to convert that image into a PDL format. However, when the PDL command is merely appended to the scanned image to convert that image into the PDL format, a large file size is required, and such file may cram the database.