1. Field of the Invention
The present invention relates to an image processing apparatus, image processing method, program, and storage medium for generating metadata for searching for an object in document images formed from a plurality of pages and transmitting the metadata to an external apparatus.
2. Description of the Related Art
Conventionally, when a character string adjacent to a non-text object (e.g., a photo, drawing, line art, or table) in a document image is a caption describing the object, the character string of the caption is associated as metadata with the object. In the following description, an object refers to a photo, drawing, line art, table, or the like and excludes text, unless otherwise specified. Metadata associated with an object can function as a search keyword to search for the object when an application uses a document image (see, for example, Japanese Patent Laid-Open No. 11-306197).
In a general document image, a figure number (e.g., “FIG. 1” or “FIG. 1”) is often described in a caption region adjacent to an object such as a drawing. The object is explained in the body using the figure number. In such a case, a hypertext is formed by automatically generating a link between a figure number and the same expression in a body. Assume that a caption adjacent to an object is “FIG. 1” and a description “FIG. 1 is AAA.” exists in a body. Since the caption “FIG. 1” and “FIG. 1” in the body are the same expression, a link is generated (see, for example, Japanese Patent Laid-Open No. 10-228473).
A system is becoming popular, in which a scanner or MFP (Multi Function Peripheral) is connected to a host computer (to be referred to as a PC) via a network or the like. A document image input by the scanner or MFP can be transmitted to the PC via the network. In this system, a document image to be transmitted to the PC generally undergoes arbitrary image processing and format conversion processing (e.g., PDF, XPS, or JPEG).
When transmitting a document image to the PC in the system, multi-page data (e.g., multi-page PDF) can also be generated from input document images of a plurality of pages.
Problems will be explained, which arise from association of metadata with an object in order to search for the object when transmitting input document images of a plurality of pages from the MFP or the like to the PC. Especially a case in which the page of a caption adjacent to an object is different from that of a body containing the same expression as a character string (e.g., figure number) in the caption will be described with reference to FIG. 8A.
FIG. 8A exemplifies document images formed from four pages. Reference numerals 801 to 804 denote first to fourth pages in order. The page 801 includes a photo object and a caption “FIG. 1” adjacent to the object. The pages 802 and 803 include only bodies. The page 804 includes only a body, too, but contains the same expression as the caption “FIG. 1” in the page 801.
According to the conventional technique, for example, a character string “AAA” is extracted as metadata for searching for the photo object in the page 801, from the body of the page 804 containing the same expression as the caption “FIG. 1” in the page 801. More specifically, the character string “AAA” in the body of the page 804 is associated as metadata with the photo object in the page 801. An application can search for the photo object in the page 801 by using “AAA” as a search keyword.
However, the following problem occurs when the MFP associates the character string “AAA” in the body of the page 804 with the photo object in the page 801 shown in FIG. 8A and transmits the document images to the PC. More specifically, the MFP cannot transmit the page 801 till the completion of detecting the page 804 of the body containing the same expression as the caption and associating metadata. The MFP needs to hold the page 801. If the pages 802 and 803 are transmitted to the PC before the page 801, the page order changes. Hence, the MFP cannot transmit the pages 802 and 803 and needs to hold them, too. The MFP therefore requires a large work memory to hold pages. For example, even if one page is 500 KB (kilobytes) in document images made up of four pages as shown in FIG. 8A, a 2-MG (megabytes) work memory is necessary.
Another problem is poor transfer efficiency because transmission cannot start until the page 804 of the body containing the same expression as the caption is detected.