1. Field of the Invention
The present invention relates to an image processing apparatus, a method, and a medium storing a program for processing a document.
2. Description of the Related Art
Conventionally, a document stored in an image processing apparatus is often subjected to image processing such as editing and printing. Consider, for example, that a document “SPECIFICATION A” is stored in an image processing apparatus as shown in FIG. 12. In the example, a user A retrieves and edits the “SPECIFICATION A” and stores it again as a “SPECIFICATION A′” in the image processing apparatus. If the user A further retrieves and edits the “SPECIFICATION A′” and stores it again as a “SPECIFICATION A′” in the image processing apparatus, then similar documents “SPECIFICATION A”, “SPECIFICATION A′”, and “SPECIFICATION A′” exist in the image processing apparatus. In general, a number of users (users B to D shown in FIG. 12) may often access a certain document. In such a case, a number of documents with similar names are brought into existence if a number of users store the documents with their own names. The documents with similar names are obtained by editing the original document. Therefore, the documents redundantly include display lists and vector data used in a printing process. The redundancy of data weighs heavily on the storage capacity of a storage device (such as an HDD) in the image processing apparatus.
Meanwhile, the differences between the documents with similar names are unclear, and the convenience of the users is significantly reduced. For example, even if the rules for providing the names are integrated, the users cannot recognize which document among a number of documents should be printed if the differences between the documents are unclear. There is a CVS (Concurrent Versions System) generally known as a version management system of files. However, the display of only the difference data of the files in a form recognizable by the users is not possible even if such a function is used, and the above problem cannot be solved. Furthermore, the extraction of the differences is difficult because the vector data of the documents is binary data. As a result, the data needs to be redundantly stored.
Therefore, when a user processes a targeted document, it is desirable that the differences from other documents are clear for other users. It is also desirable that the increase in the storage capacity in the image processing apparatus can be reduced.
Various techniques are developed to compare documents and extract differences. Japanese Patent Laid-Open No. 2004-246577 describes a technique for searching original electronic data based on an input image and comparing the searched original electronic data and the inputted image to extract difference information. Furthermore, Japanese Patent Laid-Open No. 2004-246577 discloses an image processing method for converting extracted difference information into vector data and combining the vector data with original electronic data. According to the description of the image processing method, difference information not existing in the original electronic data can be reused to improve the editability. The method is advantageous in that a difference in the color value can be obtained pixel by pixel in the comparison of documents in bit map data. However, although a difference can be extracted pixel by pixel in the case of bit map data, a detailed difference of text cannot be extracted. For example, the user cannot obtain information, such as how a character string (character code) has changed, what character string is added, or how the point of thickness of line has changed.