1. Field of the Invention
The present invention relates to an image processing device and an image processing method that generate electronic document data including two-way link information from a paper document or electronic document data.
2. Description of the Related Art
In general, a paper document and an electronic document include characters, graphics and the like. For example, there is a paper document, an electronic document or the like that includes an “object” (region 1614), an “anchor expression accompanying the object (for example, an expression such as a “figure number”, “Figure 1” or “FIG. 1”)” (region 1612) and a “text including the anchor expression” (region 1613) shown in FIG. 16A. Specifically, examples of this type of document include an academic paper, a patent document, an instruction manual and a product catalogue. Here, the “object” in the present specification refers to a region of a “figure”, a “photograph”, an “illustration” or the like included in a document. The “text including the anchor expression” refers to a text including sentences that describe or explain the “object”. The “anchor expression” refers to, for example, characters (such as a figure number) for recognizing the object like “Figure 1” included in the region 1611. In the following explanation, the “text including the anchor expression” is referred to as a “description text for the object”. As described above, when the document includes the “object”, a reader of the document needs to read the document with consideration given to a two-way correspondence relationship between the “object” and the “description text for the object”.
However, when the reader has difficulty in grasping the correspondence relationship between the “object” and the “description text for the object” in a document, the reader needs much time to read it to understand correctly. The reader needs extra time to understand the content of the document. Here, as an example of a paper document in which the correspondence relationship between the “object” and the “description text for the object” is difficult to grasp, an example of FIG. 16B will be explained. FIG. 16B shows an example where a paper document composed of N pages, that is, pages 1 to N (N: an integer), separately has a page of the “object” and a page of the “description text for the object”. A region 1604 is an “object”, a region 1605 is a “caption accompanying the object”, a region 1606 is an “anchor expression in the caption” and a region 1602 is an “anchor expression in a text”. A region 1601 is a “text including anchor expression”, that is, a “description text for the object”, and regions 1603 are the other texts. In general, when the reader of the document reads the text within the region 1601 on page 1, the reader searches another page including the object indicated by the “anchor expression in the text” in the region 1602 (“FIG. 1” shown in FIG. 16B). Then, the reader searches the region 1606 on page N, and after reading the regions 1604 and 1605, the reader returns to page 1 and reads sentences in the text following the region 1602. By contrast, when the reader first sees page N, the reader searches for a portion of the text including an “anchor expression in a caption” like the region 1606 (here, “FIG. 1”). As described above, the reader searches the region 1602 on page 1, reads the text including the “FIG. 1” that is an anchor expression, and thereafter returns to page N. As described above, when the paper document is used in which it is difficult to grasp the correspondence relationship between the “object” and the “description text for the object”, the reader manually turns pages to the corresponding page, and searches for a position (what page, what paragraph and what line) where the “object” or the “description text for the object” is described. It takes much time to do this. Then, it is time-consuming to read what is described in the searched position and thereafter return to the original position on the original page. On the other hand, when an electronic document is used, it is necessary to search for the position where the “object” or the “description text for the object” is described using the page scrolling function and the search function of application in a personal computer (hereinafter, a PC), and this is also a time-consuming operation. It is also time-consuming to read its content and thereafter return to the original position on the original page. An example shown in FIG. 16B indicates that, in a document composed of N pages, that is, pages 1 to N, one “object” and one “description text for the object” are present in each of the N pages. Needless to say, as the number of pages, the number of “objects” and the number of “description texts for the objects” are increased, it becomes more time-consuming. Another example of the document in which the correspondence relationship between the “object” and the “description text for the object” is difficult to grasp is shown in FIG. 16C. In FIG. 16C, although the “object” and the “description text for the object” are on the same page, they are located apart from each other.
As described above, in the document in which the correspondence relationship between the “object” and the “description text for the object” is difficult to grasp, the reader of such a document disadvantageously takes much time to read it, and also takes an extra time to understand the content of the document.
To overcome the problem, Japanese Patent Laid-Open No. H11-066196(1999) discloses an invention in which a paper document is optically read and a document that can be utilized in various computers corresponding to utilization purposes can be generated. Specifically, an electronic document is generated by producing hypertext on figures and figure numbers. Then, the “figure number” in the text is clicked with a mouse or the like, and thus it is possible to display a figure corresponding to the “figure number” on a screen.
However, in Japanese Patent Laid-Open No. H11-066196(1999), link information from an “anchor expression in a text” to an “object” is generated whereas link information, in the opposite way, from the “object” to the “anchor expression in the text” or to a “description text for the object” is not generated. Thus, it is time-consuming to search the “description text for the object” from the “object”.
It is also time-consuming for the reader to first read the “description text for the object” and reference the “anchor expression in the text” to find the “object” and thereafter return to the “description text for the object” that has been immediately previously read. In other words, it is time-consuming to search for the position (what page, what paragraph and what line) of the “description text for the object”.