This invention generally relates to a technology facilitating accurate and efficient image retrieval.
Digital images are increasingly more common as scanners and digital cameras drop in price and increase in availability and function. As digital photographers (amateurs and professionals alike) amass large collections of digital photographs on their computers, the challenges involved with organizing, querying, and accessing digital images grow.
Therefore, digital photographers need to utilize xe2x80x9cimage retrievalxe2x80x9d technology to accomplish their tasks. xe2x80x9cImage retrievalxe2x80x9d refers to a technology focused on the organization of a library of digital images, the inquiry into such a library, and the retrieval of selected images that meet the terms of such inquiry.
Images in a library may be organized and, thus, retrieved in an organized fashion based upon their content. Content-based categorization and image retrieval approaches are beneficial to all those with access to a library of digital images.
Image Retrieval Systems
Automatic image retrieval systems provide an efficient way for users to navigate through the growing numbers of available images. Traditional image retrieval systems allow users to retrieve images in one of two ways: (1) keyword-based image retrieval or (2) content-based image retrieval.
Keyword-Based. Keyword-based image retrieval finds images by matching keywords from a user query to keywords that have been manually added to the images. Thus, these images have been manually annotated with keywords related to their semantic content. One of the more popular collections of annotated images is xe2x80x9cCorel(trademark) Galleryxe2x80x9d, an image database from Corel Corporation that includes upwards of one million annotated images.
Unfortunately, with keyword-based image retrieval systems, it can be difficult or impossible for a user to precisely describe the inherent complexity of certain images. As a result, retrieval accuracy can be severely limited because some imagesxe2x80x94those that cannot be described or can only be described ambiguouslyxe2x80x94will not be retrieved successfully. In addition, due to the enormous burden of manual annotation, there are a limited number of databases with annotated images.
Although image retrieval techniques based on keywords can be easily automated, they suffer from the same problems as the information retrieval systems in text databases and web-based search engines. Because of wide spread synonymy and polysemy in natural language, the precision of such systems is very low and their recall is inadequate. (Synonymy is the quality of being synonymous; equivalence of meaning. Polysemy means having or characterized by many meanings.) In addition, linguistic barriers and the lack of uniform textual descriptions for common image attributes severely limit the applicability of the keyword based systems.
Content-Based. Content-based image retrieval (CBIR) systems have been built to address many issues, such as those of keyword-based systems. These systems extract visual image features such as color, texture, and shape from the image collections and utilize them for retrieval purposes. These visual image features are also called xe2x80x9clow-levelxe2x80x9d features. Examples of low-level features of an image include color histogram, wavelet based texture descriptors, directional histograms of edges, and so forth.
CBIR systems work well when the extracted feature vectors accurately capture the essence of the image content. For example, if a user is searching for an image with complex textures having a particular combination of colors, this type of query is extremely difficult to describe using keywords, but it can be reasonably represented by a combination of color and texture features. On the other hand, if a user is searching for an object that has clear semantic meanings but cannot be sufficiently represented by combinations of available feature vectors, the content-based systems will not return many relevant results. Furthermore, the inherent complexity of the images makes it almost impossible for users to present the system with a query that fully describes the their intentions.
Although CBIR solves many of the problems of keyword-based image retrieval, it has its own shortcomings. One such shortcoming is that searches may return entirely irrelevant images that just happen to possess similar features. Additionally, individual objects in images contain a wide variety of low-level features. Therefore, using only the low-level features will not satisfactorily describe what is to be retrieved.
Semantic Concepts. The user is typically looking for specific semantic concepts rather than specific low-level features. However, there is a disparity between xe2x80x9csemantic conceptsxe2x80x9d and xe2x80x9clow-level image features.xe2x80x9d This disparity limits the performance of CBIR systems. Semantic concepts include meaningful content of an imagexe2x80x94for example, a river, a person, a car, a boat, etc. Although objectively measurable, low-level image features lack specific meaning.
The mapping between semantic concepts and low-level features is still impractical with present computer vision and AI techniques. To improve this situation, more research efforts have been shifted to xe2x80x9crelevance feedbackxe2x80x9d techniques recently.
Relevance-Feedback CBIR
A common type of a CBIR system is one that finds images that are similar to low-level features of an example image or example images. To weed out the irrelevant images returned in CBIR, some CBIR systems utilize user feedback to gain an understanding as to the relevancy of certain images. The user feedback is in the form of selected exemplary images (either positive or negative). These exemplary images may be called xe2x80x9cfeedbackxe2x80x9d images.
The user feedback selects the exemplary images used to narrow successive searches. A common approach to relevance feedback is estimating ideal query parameters using the low-level image features of the exemplary images. Thus, relevance feedback maps low-level features to human recognition of semantic concepts.
In a relevance-feedback CBIR system, a user submits a query and the system provides a set of query results. More specifically, after a query, the system presents a set of images to the human querier. The human designates specific images as positive or negative. Positive indicates that the image contains the semantic concepts queried and negative indicates that the image does not contain such concepts.
Based upon this feedback, the system performs a new query and displays a new set of resulting images. The human again provides feedback regarding the relevance of the displayed images. Another round of query and feedback is performed. Each round may be called an iteration. The process continues for a given number of iterations or until the user (or system) is satisfied with the overall relevance of the present set of images.
One of the most popular models used in information retrieval is the vector model. The vector model is described in such writings as Buckley and Salton, xe2x80x9cOptimization of Relevance Feedback Weights,xe2x80x9d in Proc of SIGIR""95; Salton and McGill, xe2x80x9cIntroduction to Modern Information Retrieval,xe2x80x9d McGraw-Hill Book Company, 1983; and W. M. Shaw, xe2x80x9cTerm-Relevance Computation and Perfect Retrieval Performance,xe2x80x9d Information processing and Management. Various effective retrieval techniques have been developed for this model and among them is the method of relevance feedback.
Most of the existing relevance feedback research can be classified into two approaches: query point movement and re-weighting.
Query-Point-Movement
The query-point-movement method essentially tries to improve the estimate of an xe2x80x9cideal query pointxe2x80x9d by moving it towards good example points and away from bad example points. The frequently used technique to iteratively improve this estimation is the Rocchio""s equation given below for sets of relevant documents Dxe2x80x2R and non-relevant documents Dxe2x80x2N noted by the user:                               Q          xe2x80x2                =                              α            ⁢                          xe2x80x83                        ⁢            Q                    +                      β            (                                          1                                  N                                      R                    xe2x80x2                                                              ⁢                                                ∑                                      i                    ∈                                          D                      R                      xe2x80x2                                                                      ⁢                                  xe2x80x83                                ⁢                                  D                  i                                                      )                    -                      γ            (                                          1                                  N                                      N                    xe2x80x2                                                              ⁢                                                ∑                                      i                    ∈                                          D                      N                      xe2x80x2                                                                      ⁢                                  xe2x80x83                                ⁢                                  D                  i                                                      )                                              (        1        )            
where xcex1, xcex2, and xcex3 are suitable constants and NRxe2x80x2 and NNxe2x80x2 are the number of documents in Dxe2x80x2R and Dxe2x80x2N respectively. In this equation, Dxe2x80x2R are those images (i.e., documents) that the user found relevant and Dxe2x80x2N are those images that the user did not find relevant.
The first portion (before the subtraction sign) of Equation 1 is a xe2x80x9creward functionxe2x80x9d that rewards query results that include the desired semantic content. The reward is based upon the positive feedback from the querier. The last portion (after the subtraction sign) of Equation 1 is a xe2x80x9cpenalty functionxe2x80x9d that penalizes query results that do not include the desired semantic content. The penalty is based upon the negative feedback from the querier.
This technique is employed, for example, by the MARS system, as described in Rui, Y, Huang, T. S., and Mehrotra, S. xe2x80x9cContent-Based Image Retrieval with Relevance Feedback in MARS,xe2x80x9d in Proc. IEEE Int. Conf. on Image proc., 1997.
Some existing implementations of point movement strategy use a Bayesian method. Specifically, these include Cox et al. (Cox, I. J., Miller, M. L., Minka, T. P., Papathomas, T. V, Yianilos, P. N. xe2x80x9cThe Bayesian Image Retrieval System, PicHunter: Theory, Implementation, and Psychophysical Experimentsxe2x80x9d IEEE Tran. On Image Processing, Volume 9, Issue 1, pp. 20-37, January 2000) and Vasconcelos and Lippman (Vasconcelos, N., and Lippman, A., xe2x80x9cA Bayesian Framework for Content-Based Indexing and Retrievalxe2x80x9d, In: Proc. of DCC""98, Snowbird, Utah, 1998) used Bayesian learning to incorporate user""s feedback to update the probability distribution of all the images in the database.
In these conventional works, they consider the feedback examples to the same query to be independent with each other. They do this so that they can use Naive Bayesian Inference to optimize the retrieval results by using feedback examples.
These conventional works do not treat all positive examples to be closely related with each other. They do not use all these positive examples of the same query to construct a Bayesian classifier and use that classifier to represent the original query and try to get more accurate retrieval results. These works are not incremental.
Re-Weighting
With the re-weighting method, each image is represented by an N dimensional feature vector; thus, the image may be viewed as a point in an N dimensional space. Therefore, if the variance of the good examples is high along a principle axis j, the values on this axis are most likely not very relevant to the input query and a low weight wj can be assigned to the axis. Therefore, the inverse of the standard deviation of the jth feature values in the feature matrix is used as the basic idea to update the weight wj. The MARS system mentioned above implements a slight refinement to the re-weighting method called the standard deviation method.
To optimize the query for further image similarity assessment, conventional relevance-feedback systems use only weighted feature sum (WFS) of the feedback images. WFS is a conventional query-refinement technique. WFS requires many iterations (well more than three) to produce adequate results. WFS does not work very well in many cases, particularly when the user wants to express an xe2x80x9cORxe2x80x9d relationship among the queries.
Multiple Iterations
Conventional relevance feedback techniques may require many iterations before the majority of these results include images with the desired semantic content. They require at least three iterations, but typically much more than three iterations, before generating results with the desired semantic content.
These conventional relevance feedback methods either have no strategy to progressively adjust their results or have bad performances on large datasets. With conventional relevance feedback methods, the positive and negative feedbacks are always treated as the same processes.
Described herein is a technology for relevance-feedback, content-based facilitating accurate and efficient image retrieval. More specifically, the technology minimizes the number of iterations for user feedback regarding the semantic relevance of exemplary images while maximizing the resulting relevance of each iteration.
One technique for accomplishing this is to use a Bayesian classifier to treat positive and negative feedback examples with different strategies. A Bayesian classifier determines the distribution of the query space for positive examples. Images near the negative examples are penalized using a xe2x80x98dibblingxe2x80x99 process. This technique utilizes past feedback information for each iteration to progressively improve results.
In addition, query refinement techniques are applied to pinpoint the users"" intended queries with respect to their feedbacks. These techniques further enhance the accuracy and usability of relevance feedback.
This summary itself is not intended to limit the scope of this patent. Moreover, the title of this patent is not intended to limit the scope of this patent. For a better understanding of the present invention, please see the following detailed description and appending claims, taken in conjunction with the accompanying drawings. The scope of the present invention is pointed out in the appending claims.