1. Field of the Invention
The present invention relates to management of document data and, more particularly, to management of document data of a plurality of versions generated by updating document data.
2. Description of the Related Art
Multifunctional peripherals (to be referred to as MFPs hereinafter) including a scanner, printer, image processor, input/output interface (to be referred to as an input/output I/F hereinafter), and operation unit are becoming popular. An MFP connects to a network such as a LAN through an input/output I/F to form a print system that implements a variety of functions including copy operation. The MFP can carry out tasks such as sending scanned image data to a personal computer (to be referred to as a PC hereinafter) on the network or transmit the image data to an external device through a telephone line as FAX data. The MFP can also receive and print out image data from the PC connected to the network.
In the print system, a database and a management server to efficiently manage the database are connected to the network. The database using a magnetic storage device such as an HDD stores an enormous quantity of data. The print system having such a database can send image data in the database to the PC via the network and cause a printer connected to the PC to print out the image data. The print system can also send image data in the database to the MFP via the network and cause the MFP to print out the image data. It is also possible to store, in the database via the network, image data scanned by the MFP.
In the field of image processing technology, various techniques such as a technique (block segmentation or block selection technique) of segmenting image data into regions in accordance with attributes, an OCR technique, and a pattern matching technique have been developed. The block selection technique recognizes image data in one page and segments it into blocks such as a text region, line region, photo region, and table region. The OCR technique creates text data from image data of a text portion input from, e.g., a scanner. The pattern matching technique determines the similarity of images on the basis of color information, edge information, and shape feature amount and selects an image matching a target image from a plurality of image data.
There is also developed a system that combines the above-described print system and the image processing technology. This system compares image data scanned by the scanner of an MFP with data in a server and searches for the original data of the printed document (e.g., Japanese Patent Application Laid-Open No. 05-037748).
On the other hand, campaigns to stop wasteful print out and copy have currently become common from the viewpoint of cost reduction in offices and saving of paper resources.
A specification or manual under preparation is frequently updated. In reviewing the document or checking its contents, it is however undesirable to print out all pages of the latest version. Preferably, only the difference from an old version is printed and replaced with an existing paper document. To realize this by a conventional technique, it is necessary for the creator of a document to    1) print out a minimum number of pages necessary for replacement in consideration of the difference from a former version, and    2) replace corresponding pages of an existing paper document printed out in the past with the newly printed pages.
Actually, much labor from the user is required for the operation of printing out only replacement pages because changes in the contents are hard to recognize due to repetitiveness of texts and graphics or mismatch of page numbers. For this reason, he/she often ends up printing out all pages of the document, resulting in waste of paper resources.
Assume that a user has an existing printed paper document of Ver1.0, as shown in FIG. 35A. After print out, the document is revised to a version (Ver1.1) shown in FIG. 35B. Revision of the document is done by, e.g., a wordprocessor application on a PC so that Ver1.1 is formed as electronic data. As the contents of revision, the version number indicated by “Ver” in page 1 is revised (from 3501 to 3501′), and “Text•2” in the page 2 is revised (from 3502 to 3502′). The remaining portions have no alteration.
Pages that must be printed as the revised parts of Ver1.1 are contained in pages 1 to 3. Page 4 and subsequent pages have the same contents as those of the already printed paper document of Ver1.0 and need not be printed out again. The user discards pages 1 and 2 of the paper document of Ver1.0 (FIG. 35A), prints out pages 1 to 3 of Ver1.1 (FIG. 35B) formed as electronic data, and replaces them. FIG. 35C shows the replacement result.
In the state shown in FIG. 35C, the contents of the paper document after replacement are difficult to grasp because the contents of “Text•3” and the page number of page 3 repeat. In this example, the document is revised only in one part. If the above-described state occurs in a plurality of parts of the document, it is very difficult to understand the paper document after replacement. Additionally, the replacing operation itself is very cumbersome. This eventually makes the creator to print out all pages of the electronic data of Ver1.1 and discard the whole paper document of Ver1.0 without replacement.
A general system for searching for the original data of a printed document can of course search for the original data, but fails to achieve matching with scan data if data in the server are updated, or their contents are revised. It is therefore impossible to effectively use this system to efficiently replace a paper document.