1. Field of the Invention
The present invention relates to image processing technologies and, in particular, to image learning, automatic annotation, a retrieval method, and a device.
2. Description of the Related Art
With rapid development of networks and wide-spread use of mobile phones and home-use digital cameras having digital image sensors, large amounts of digital image resources are generated.
In order to handle large amounts of image data, effective and practical image retrieval systems are demanded. Therefore, in a content-based image retrieval (CBIR) field, a study has been made based on a content to extract a definite word meaning content from an image so as to access and retrieve user images.
The above study is based on retrieval in which the CBIR uses image similarity in an initial stage. In other words, when images, colors, or schematic views are input by the user, retrieval results are also images similar to the input images, the colors, or the schematic views. However, such a CBIR is hardly practical. The following two points are regarded as main reasons.
First, the user is required to have an image or required to have an ability to appropriately select colors or describe schematic views. These requirements pose a barrier to the user, thus limiting widespread use of the above system.
Furthermore, the image retrieval based on image similarity depends on comparison in similarity between bottom layer images. However, there is a difference in word meanings between the features of the bottom layer and the image information of an upper layer. Therefore, as a result of the retrieval, it is likely that the bottom layer images are greatly different in word meanings although they are visually similar. This results in a great impact on accuracy in the retrieval result.
In order to solve the above problems in the CBIR, researchers have proposed a retrieval system based on image annotation. In this retrieval system, annotations are made in text information for image data so as to perform image retrieval based on texts. Since this method allows the user to perform retrieval only with the input of a keyword, the above requirements on the user's ability are reduced.
As present image automatic annotation, the following methods are known.
Method 1:
Automatic annotation is made based on original data (such as time, GPS information, an image name associated with an image taken by a digital camera, a text associated with an image in a digital format, etc.).
Method 2:
After estimation of a complicated association between a character and an image using computer visual technology and machine learning technology in a learning stage, automatic annotation is made on images not appearing in the learning stage based on the association.
However, the above method 1 based on original data has the following problems.
In other words, the original data of an image may not be associated with an image content. Therefore, the quality of image annotation is poor.
Since the above method is applied only to an image with a text, its application range is greatly limited.
The method 1 has an unavoidable defect. Therefore, the method 2 has been proposed as a modification of the method 1. The details of the method 2 are as follows.
The method 2 includes the following steps.
Step A:
An image is segmented into regions with a region segmentation method, and the feature vectors of the respective regions are calculated.
Step B:
In the learning stage, the respective regions and k regions nearest the respective regions are linked to each other, and the image and real annotations related to the image are linked to each other.
Step C:
In an automatic annotation stage, all the graphs built with a random walk with restart (RWR) are searched for to obtain corresponding annotations.
The above method is specifically referred to as “GCap: Graph-based automatic image captions” in Proc. Of the 4th International Workshop on Multimedia Data and Document Engineering (MDDE), in conjunction with Computer Vision Pattern Recognition Conference (CVPR' 04), 2004 by J. Y.Pan, H. J.Yany, C. Faloutsos, and P. Duygulu.
The GCap algorithm is theoretically based on the fact that an access time to the node of an annotated image (measured image) is greater than an access time to another node. Thus, by confirming the access time to the annotated node, it is possible to find the annotation having the strongest correlation.
However, there is a likelihood of causing image regions that are erroneously linked to each other in the graph obtained in the learning stage by the above method, which results in poor accuracy in annotation.