With the rapid development of the computer technology and the world-wide web, there has been an explosive tendency of increase in the amount and complexity of digital images in many areas of technologies. As a result, the challenges also increase in the effective and efficient management of this vast amount of images, such as accessing, organizing, retrieving, and so on. To meet such requirement, much attention has been put by researchers and scholars of various fields to content based image retrieval technology from the 90's of the last century, and many effective techniques and systems have been developed.
Image retrieval refers to a technology of the inquiry into a digital image database, so that the retrieved images can meet the user's intention. Traditional image retrieval systems allow users to search for image databases in either of two ways: keyword based retrieval, and content based retrieval.
In keyword based retrieval, the images in the database are labeled in advance, that is to say, the images are described with keywords. The retrieval is then carried out in accordance with the keywords of the images. However, there are two major problems associated with this mode of retrieval: (1) one is the enormous burden required in manually labeling the images, and this is more apparent when the size of the database is large; and (2) what is more serious is the usual inconsistency between the image content and the individual comprehension. In other words, different individuals have different points of interests with regard to the same image, and the comprehensions and intentions of the different individuals are also different.
The concept of content based image retrieval was proposed in the 90's of the last century to address the problems existent in the keyword based retrieval systems. In contrast to the mode of keyword based retrieval, a content based retrieval system retrieves an image directly based on the image content. In such a system, a user is required to provide a query image to express his intention, and the retrieval system subsequently retrieves the image database to find images similar to the query image and returns them to the user. To achieve this goal, the retrieval system usually first extracts from the query image and the database images such low-level features as colors, textures and shapes. The distance between the query image and the database images is then calculated based on these features to determine their similarity. And finally, the database images most similar to the query image are returned. If the features of the images can fitly describe the image content, such a mode of retrieval is very effective. For instance, if a user intends to retrieve images having specific colors and complicated textures, the content based retrieval system can perfectly carry out the task by using the features of the colors and the textures. To the contrary, this goal can hardly be achieved by describing with keywords.
Nevertheless, application of the content based image retrieval is largely limited by the difference between the low-level features of the image and the high-level conceptualization of human perception. First, the effectiveness of the content based image retrieval system usually depends on the features adopted. For example, features relevant to the shape is relatively effective in the case of retrieving the images of “cars”, whereas it is more reasonable to describe with features relevant to colors with regard to a scenic image of “sunset”. Therefore, different strategies should be employed in retrieving images of different types, but it is difficult for a nonprofessional user to determine as to which features are relatively more effective. In addition, different users have different points of interests at different times even with regard to the same image. In other words, perceptual similarities are associated with such factors as the environments of application, the individuals involved and the contexts.
In order to reduce the difference between the low-level features and the high-level perception, researchers have proposed a strategy of relevance feedback, and achieved considerable success in this regard. An image retrieval system equipped with the relevance feedback enhances the precision of the retrieval through interactions between a retrieval engine and a user. Such a system should contain at least two modules: a learner module and a selector module. In each round of the feedback, the user is required to provide some feedback information, that is, to judge the images returned by the selector module and mark them either as relevant or irrelevant (the relevant images and the irrelevant images are respectively referred to as positive samples and negative samples); the learner module learns again the user's intention based on the feedback information, and returns new retrieval result. At the same time, the selector module selects some images from the image database based on the current learning result, and, returns them to the user via a user interface. During the next round of feedback, the user will be required to provide feedback information on these images.
Many relevance feedback methods have been developed in the past decade along the path from heuristic strategy to optimized learning. Most early relevance feedback methods pertain to the category of “Query Point Movement and Re-weighting”, for which the task of the search engine consists in, at each round of the feedback procedure, generating better query features and reasonably adjusting the weights of various features to better adapt to the user's intention.
[Patent document-1] is one of the earlier image retrieval apparatuses based on the strategy of “Query Point Movement and Re-weighting”. In this apparatus a weighted average of the features of the relevant images (positive samples) obtained via feedback is taken as a new query point. At the same time, this apparatus makes use of a re-weighting strategy based on standard variance.
Some existing retrieval systems use a Bayesian method to carry out “Query Point Movement and Re-weighting”. [Patent document-2] makes use of a Bayesian classifier to differentiate the relevant images and the irrelevant images obtained via feedback. The relevant images (positive samples) are regarded in this method as belonging to the same semantic class, and their distributions are estimated by means of the Bayesian classifier. By contrast, the irrelevant images (negative samples) are usually irrelevant in semantics. Consequently, images surrounding the negative samples are penalized through a “dibbling” process.
[Non-patent document-1] employs the Bayesian theory to estimate the local decision boundary of the positive samples and negative samples surrounding the query image, and calculates a proper location in the region of the positive samples as a new query point.
Given the feedback information of a user, [non-patent document-2] employs the Bayesian theory to estimate the intention of the user. Specifically, a posterior probability distribution of all images in the database is estimated, and the probability distribution is updated in accordance with the result of each retrieval feedback.
Later on, researchers began to look at the relevance feedback problem more systematically by formulating it into problems of learning, classification, or probability density estimation. Refer to [non-patent document-3], the Discriminant EM method casts image retrieval as a transductive learning problem by using unlabelled images in supervised learning to achieve better classification result. However, the computational complexity of this method is high, and it is troublesome especially when the database is large.
Based on the observations that all positive samples are alike and each negative sample is negative in its own way, Zhou and Huang proposed in [non-patent document-4] a biased discriminant analysis and its kernel form, to find a better transformed space, where the positive samples cluster while the negative samples scatter away from the positive samples.
Recently, many relevance feedback technologies rely on support vector machines (shortened as SVM), such as the methods described in [non-patent document-5], [non-patent document-6] and [non-patent document-7]. Compared with other learning methods, SVM has many advantages, such as for instance, good generalization ability; without restrictive assumptions regarding the object to be processed; fast learning and predicting speed; and flexibility, etc.
However, these learning methods are challenged by the problem of small sample size, namely the problem of insufficient training samples. This is because few users will be so patient as to label a large number of images in the relevance feedback process. Therefore, given the number of the images to be labeled, how to choose images for the user to label is a crucial issue in minimizing the amount of interaction between the user and the learner required for reaching good results. Generally speaking, two strategies are used to address the problem of insufficient training samples: (1) active learning, or active selecting; (2) exploiting unlabelled images.
Active learning strategy usually employs a selector module to actively select images from the image database for the user to label and feed back, in order to achieve the maximal information gain in decision making and feedback. Such a method is presented in [non-patent document-5]. They proposed that the selected images should maximally reduce the size of the version space, which can be achieved by selecting the points nearest to the decision boundary. Another conventional method is the angle-diversity strategy, as shown in [non-patent document-8]. This method achieves the objective of simultaneously selecting a plurality of samples by balancing the distance between the image samples and the decision boundary as well as the angles between these samples.
In order to address the problem of insufficient training samples, it has become a hot topic of research in the past few years to acquire information from unlabelled images. The basic principle of this strategy is to enhance accuracy of classification through the unlabelled images. Some methods use a generative model for the classifier and employ EM scheme to model the label or parameter estimation process, while others yields an optimal labeling of the unlabelled examples by using the minimum cut on the graph. Another prominent achievement of acquiring information from unlabelled images is the co-training strategy, which trains two different classifiers from two different angles of perspective, and makes use of the prediction result of the one classifier on the unlabelled images to augment a training collection of another classifier, as shown in [non-patent document-9].    [Patent document-1]: U.S. Pat. No. 6,859,802 B1    [Patent document-2]: U.S. Pat. No. 7,113,944 B2    [Non-patent document-1]: Giorgio Giacinto, Fabio Roli, Bayesian Relevance Feedback for Content-Based Image Retrieval, Pattern Recognition, vol. 37, no. 7, pp. 1499-1508, 2004.    [Non-patent document-2]: Ingemar J. Cox, Matt L. Miller, Thomas P. Minka, Thomas V. Papathomas, Peter N. Yianilos, The Bayesian Image Retrieval System, PicHunter: Theory, Implementation, and Psychophysical Experiments, IEEE Transactions on Image Processing, vol. 9, no. 1, pp. 20-37, 2000.    [Non-patent document-3]: Ying Wu, Qi Tian, Thomas S. Huang, Discriminant-EM Algorithm with Application to Image Retrieval, in Proc. IEEE Int'l Conf. on Computer Vision and Pattern Recognition, pp. 222-227, 2000.    [Non-patent document-4]: Xiang Sean Zhou, Thomas S. Huang, Comparing Discriminating Transformations and SVM for Learning during Multimedia Retrieval, in Proc. ACM Multimedia, pp. 137-146, 2001.    [Non-patent document-5]: Simon Tong, Edward Chang, Support Vector Machine Active Learning for Image Retrieval, in Proc. ACM Multimedia, pp. 107-118, 2001.    [Non-patent document-6]: Jingrui He, Mingjing Li, Hong-Jiang Zhang, Hanghang Tong, Changshui Zhang, Mean Version Space: a New Active Learning Method for Content-Based Image Retrieval, in Proc. the 6th ACM SIGMM Int. Workshop on Multimedia Information Retrieval (MIR), pp. 15-22, 2004.    [Non-patent document-7]: Lei Wang, Kap Luk Chan, Zhihua Zhang, Bootstrapping SVM Active Learning by Incorporating Unlabelled Images for Image Retrieval, in Proc. IEEE Int'l Conf. on Computer Vision and Pattern Recognition, pp. 629-634, 2003.    [Non-patent document-8]: Klaus Brinker, Incorporating Diversity in Active Learning with Support Vector Machines, in Proc. of the 20th Int'l Conf. on Machine Learning (ICML), pp. 59-66, 2003.    [Non-patent document-9]: Zhi Hua Zhou, Enhancing Relevance Feedback in Image Retrieval Using Unlabeled Data, ACM Transactions on Information Systems, vol. 24, no. 2, pp. 219-244, 2006.    [Non-patent document-10]: Jianbo Shi, Jitendra Malik, Normalized Cuts and Image Segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, 2000.    [Non-patent document-11]: translated by Hongdong Li, and Tianxiang YAO et al., Mode Classification, Publishing House of Machinery Industry, Zhongxin Publishing House, pages 415-477.