1. Field of the Invention
The present invention relates to an image processing apparatus, an image processing method, and a computer program that can generate electronic document data in which an object can be searched from a document image.
2. Description of the Related Art
Conventionally, it is desired to construct an image processing system that can search objects other than characters, such as pictures, graphics, line drawings, and tables, in a document image so that these objects can be easily used. The objects described in the following description are objects other than characters unless they are specifically mentioned.
For example, the image processing system extracts an object from the document image and determines whether a caption character string (i.e., a character string that explains the object) is present in the vicinity of the object. If it is determined that the caption character string is present, the image processing system designates the caption character string as a metadata associated with the object, so that the object can be searched based on the metadata.
Then, each object associated with the metadata is JPEG compressed and stored as a single electronic document. When an application uses the above-described electronic document, the application can perform the search using the metadata as a keyword to find an object.
Further, in a case where a caption that is adjacent to an object is a drawing number (e.g., “FIG. 1”), a general document image includes a body in which a character string that represents the same drawing number is described to explain the object. More specifically, an expression that is identical to the drawing number described in the caption can be found in the body.
As discussed in Japanese Patent Application Laid-Open No. 10-228473, there is a conventional technique capable of forming a hypertext by automatically generating a link between a drawing number in the caption and a drawing number in the body. For example, in a case where the caption that is adjacent to an object includes a drawing number “FIG. 1” and the body includes a sentence “FIG. 1 is AAA.”, a hyperlink can be generated between the “FIG. 1” in the caption and the “FIG. 1” in the body. Further, the technique discussed in the above-described prior art can form a hypertext by automatically generating a link between an object and a related body.
On the other hand, a multifunction peripheral (MFP) has the capability of generating an electronic document by performing image processing and format conversion processing on a scanned input document image and has a transmission function for transmitting the generated electronic document to a personal computer (PC) via a network.
The image processing includes processing for acquiring a character code by performing character recognition processing on a character image contained in a document image. The image processing further includes vectorization processing for converting graphics in the document image into vector data. In the format conversion processing, the data having been subjected to the above-described image processing is converted into a predetermined electronic document format (e.g., portable document format (PDF)) to generate an electronic document file.
As discussed in Japanese Patent Application Laid-Open No. 2009-009526, there is a conventional technique for embedding a character recognition result as a transparent text (i.e., a character code in an invisible state drawn by designating a transparent color as a drawing color) into an image file and converting the data into an electronic document format (e.g., PDF or XPS). When the electronic document file having been generated in this manner is displayed, a transparent text is drawn in a character portion of the document image.
In this case, if a user performs a keyword search, the system searches a transparent text. However, the user cannot visually recognize the transparent text itself. Therefore, the user feels as if a target character image portion in the document image has been searched. In this manner, the character image portion that corresponds to the keyword to be searched can be displayed in a highlighted state. Therefore, the user can efficiently identify the target character image portion.
On the other hand, in a case where a caption character string is added as a metadata to an object other than characters so that the object can be searched in an electronic document, it is desired to highlight a search result (i.e., a target object) having been hit in the keyword search.
However, the target object to be searched in this case is any one of picture, graphics, and table objects, which are greatly different in color and shape. Therefore, the highlight display may not bring an expected effect. Users are unable to identify a target object having been hit in the search.
For example, in a case where the contour of a searched object is highlighted with a red color, the highlight display for a search result cannot be effective if the searched object is a picture object including a red color portion in the vicinity of the searched object or in most of the entire area. More specifically, it becomes very difficult for users to identify the object having been hit in the search.
Further, in a case where a generated electronic document data is transmitted via a network, it is desired to reduce the data size of the electronic document data. However, if respective objects (e.g., pictures) extracted from a document image are independently compressed and the compressed image data are integrated with background image data and stored as a single electronic file, the size of the obtained file tends to become larger compared to the size of a file obtained by compressing a piece of original document image.
More specifically, in a case where a document image containing a picture is transmitted, the total data size of the transmitted image can be efficiently reduced by compressing the entire image including a picture portion and the background as an integrated compression image data rather than storing the extracted picture object and the background as independent compression image data.
In general, elements that constitute the above-described data are image information and compression header information. If the number of object data is increased, the header information is repetitively stored for respective data. Especially, the header information required in the compression tends to become larger in a highly-advanced image compression method.
Accordingly, in a case where an electronic document file is generated based on a document image that includes numerous objects (e.g., pictures), it is desired to compress all objects as a single image rather than separately compressing respective objects to efficiently reduce the total size of data.
For example, in a case where image data is stored as a JPEG compressed stream, each stream is accompanied with header information of 700 bytes or more, which includes a quantization table and Huffman codes to be used for rasterization. In a case where an image of one page includes 100 pictures, the size reduction effect comparable to 70 K bytes or more per one page can be obtained by compressing all of the pictures and the background as an integrated image data, compared to a case where the background and the pictures are respectively compressed as a total of 101 independent image data.
However, if the objects and the background are compressed as an integrated image data, electronic document data is stored in a state where a search target object is merged with the background. Therefore, it is difficult to identify and highlight a target object in the search.