In recent years, amid calls for environmental issues, there has been a rapid progression toward paperless offices. As a technique for promoting paperless operation, there is proposed a document management system which scans paper documents stored in binders and the like with a scanner or the like, converts them into image files such as portable document format (to be referred to as PDF hereinafter), and stores and manages them in an image storage.
In a multifunction apparatus with enhanced functions, in storing and managing an image file corresponding to a paper document, pointer information indicating the location of the image file in an image storage can be recorded on the cover sheet or in the information of the paper document as additional information. In reusing (e.g., copying) the paper document, the storage location of the original electronic file (image file) corresponding to the paper document is detected from the pointer information, and the original electronic file is directly reused. This makes it possible to reduce the total amount of paper documents held and obtain a print with high image quality (e.g., Japanese Patent Laid-Open No. 10-143414).
In the former document management system, a paper document can be saved as a PDF file with a small amount of information. However, a PDF file itself is image information, and thus objects (e.g., a text block, graphic block, table block, and the like) in the PDF file cannot be reused electronically. Some editing operation is required to reuse a certain object in the paper document. More specifically, the necessary object needs to be created again using application software or needs to be extracted from the PDF file.
As for the latter multifunction apparatus, generally, a paper document created in a certain organization can easily be reused in the organization because the original electronic file corresponding to the paper document can directly be accessed. On the other hand, a paper document obtained from outside the organization or a paper document in which the location of a corresponding original electronic file is unknown cannot be reused.
Under the circumstances, if the original electronic file of a paper document cannot be identified in a system, an image (image information) obtained by scanning the paper document is converted into vector data by vectorization and is saved as the original electronic file. With this operation, image information obtained by scanning an arbitrary paper document can be handled as a reusable electronic file.
However, the contents of an image generated using vector data (original electronic file) obtained by the above-mentioned vectorization may greatly be different from the contents of an image (image information) obtained by scanning the original paper document due to erroneous determination or the like, depending on the contents of the vectorization.
For this reason, in reusing the original electronic file corresponding to a paper document, the user cannot perform an intended process depending on the application purpose.
For example, if image information of a paper document comprising a text block is vectorized to generate vector data, the text block undergoes character recognition and is encoded. However, there is no font corresponding to that of the text block on the system side, the text block may be encoded (vectorized) in a different font. In this case, the text block is not vectorized in a font intended by the user.