1. Field of the Invention
The present invention relates to an image search technique.
2. Description of the Related Art
It has become a common practice to digitize paper documents by causing a scanner to read them and store the digital data in a hard disk of a computer or an information processing device. Along with the progress of a compression coding scheme such as JPEG and the growing capacity and reduction in cost of a hard disk, the quantity of stored and managed document image data is increasing. Techniques of searching for a desired image from an enormous number of image contents stored in a hard disk have been proposed.
As a method generally used to search for a desired image from a number of stored image contents, keywords are given to individual image contents in advance, and search is done based on a keyword. Images corresponding to the keyword are displayed on, for example, a monitor as a search result. The operator visually selects the desired image from the displayed images.
As the Internet has recently become popular, such a search method using keywords is commonly practiced in image search systems designed to distribute images to consumers over the Internet prepared by contents providers who have an enormous quantity of image contents.
When search is done on an enormous number of images on the Internet, the search result list may also become large. In this case, to allow the operator to visually select a desired image, as described above, it is necessary to display images similar to the target image or images that seem to be important in descending order of priority. Such a technique is disclosed in Japanese Patent Laid-Open No. 2004-220267, in which the significance of each image is determined from the structure of an HTML document, and the search result display order is decided based on the ranking result of significances.
However, in searching a database storing only document image data, the above-described search based on a keyword and search result display order determination based on a document structure are impossible.
There is proposed a technique of executing search based on image similarity. In Japanese Patent Laid-Open No. 2004-348706, a document image is segmented into a plurality of regions based on attributes, and the similarity of each segmented region is calculated by a search process suitable for its attribute, thereby searching for a similar image. Japanese Patent Application No. 2005-244684 discloses a method of calculating the encoding (image key) of an image to determine the similarity.
Image search based on a similarity allows searching for stored image data from part of an original document or similar images.
In both search based on a keyword and search based on a similarity, however, when the number of search target documents is large, that is, when a document is to be searched for on the Internet or a large document management server, it is difficult to obtain an appropriate number of search results. In keyword search, an enormous number of search results are obtained if a common word is used as a keyword. If the operator increases the number of keywords or uses a special keyword for narrow-down, the number of search results becomes small, but the target document may be excluded from the search results.
In similarity search, to obtain only data with high similarities as search results, the number of parameters for similarity calculation increases, and the calculation speed decreases. On the other hand, if the number of similarity calculation parameters is reduced to speed up the calculation, the search results include even images with low similarities as similar images, resulting in an enormous number of search results. The operator must visually discriminate the large number of images, and the time and labor for search increase.