1. Field of the Invention
The present invention relates to an image processing device for processing document images.
2. Description of the Related Art
Recently, as network such as represented by Internet being widespread, documents are commonly distributed in electrical form. On the other hand, it is still common that they are distributed in paper form. Thus, various techniques have been provided for obtaining reusable electronic data from paper documents even if the existing is only a paper document.
For example, a technique is known in which a document image obtained by scanning a paper document is sent from a terminal to a server, subjected to character-recognition on the server, converted into a reusable format, and then returned to the terminal (see the Japanese Patent Laid-Open No. H11-167532 (1999)).
Additionally, another technique is also known that allows for dividing a document image into areas corresponding to their type and outputting them individually (see the Japanese Patent Laid-Open No. H11-167532 (1999)).
Although the format of data that users want to reuse depends on the situation, it is desirable that the format is easy to extract data for the users. In addition, as character recognition techniques have certain limitations, characters may potentially be falsely recognized. If the recognition of a reusable content is less-accurate, the content will be awkward to use for the users. In the technique disclosed in the Japanese Patent Laid-Open No. H11-167532 (1999), only information of the data with character format is reusable, and the information is converted without considering any recognition accuracy. However, as for paper documents, not only the content itself of the document but also the layout or positional relationship of the content often have important meanings for reusing.
Additionally, the technique disclosed in the Japanese Patent Laid-Open No. H09-091450(1997) divides a document into contents and outputs them individually, thus the relationship between them might be lost.
Furthermore, it is also another challenge to enable users easily to reuse character images contained in a image as vector data and to reuse the character images as character codes after character recognition.