Digital images are increasingly more common as scanners and digital cameras drop in price and increase in availability and function. As digital photographers (amateurs and professionals alike) amass large collections of digital photographs on their computers, the challenges involved with organizing, querying, and accessing digital images grow.
Therefore, digital photographers need to utilize “image retrieval” technology to accomplish their tasks. “Image retrieval” refers to a technology focused on the organization of a library of digital images, the inquiry into such a library, and the retrieval of selected images that meet the terms of such inquiry.
Images in a library may be organized and, thus, retrieved in an organized fashion based upon their content. Content-based categorization and image retrieval approaches are beneficial to all those with access to a library of digital images.
Image Retrieval Systems
Automatic image retrieval systems provide an efficient way for users to navigate through the growing numbers of available images. Traditional image retrieval systems allow users to retrieve images in one of two ways: (1) keyword-based image retrieval or (2) content-based image retrieval.
Keyword-Based. Keyword-based image retrieval finds images by matching keywords from a user query to keywords that have been manually added to the images. Thus, these images have been manually annotated with keywords related to their semantic content. One of the more popular collections of annotated images is “Corel™ Gallery”, an image database from Corel Corporation that includes upwards of one million annotated images.
Unfortunately, with keyword-based image retrieval systems, it can be difficult or impossible for a user to precisely describe the inherent complexity of certain images. As a result, retrieval accuracy can be severely limited because some images—those that cannot be described or can only be described ambiguously—will not be retrieved successfully. In addition, due to the enormous burden of manual annotation, there are a limited number of databases with annotated images.
Although image retrieval techniques based on keywords can be easily automated, they suffer from the same problems as the information retrieval systems in text databases and web-based search engines. Because of wide spread synonymy and polysemy in natural language, the precision of such systems is very low and their recall is inadequate. (Synonymy is the quality of being synonymous; equivalence of meaning. Polysemy means having or characterized by many meanings.) In addition, linguistic barriers and the lack of uniform textual descriptions for common image attributes severely limit the applicability of the keyword based systems.
Content-Based. Content-based image retrieval (CBIR) systems have been built to address many issues, such as those of keyword-based systems. These systems extract visual image features such as color, texture, and shape from the image collections and utilize them for retrieval purposes. These visual image features are also called “low-level” features. Examples of low-level features of an image include color histogram, wavelet based texture descriptors, directional histograms of edges, and so forth.
CBIR systems work well when the extracted feature vectors accurately capture the essence of the image content. For example, if a user is searching for an image with complex textures having a particular combination of colors, this type of query is extremely difficult to describe using keywords, but it can be reasonably represented by a combination of color and texture features. On the other hand, if a user is searching for an object that has clear semantic meanings but cannot be sufficiently represented by combinations of available feature vectors, the content-based systems will not return many relevant results. Furthermore, the inherent complexity of the images makes it almost impossible for users to present the system with a query that fully describes the their intentions.
Although CBIR solves many of the problems of keyword-based image retrieval, it has its own shortcomings. One such shortcoming is that searches may return entirely irrelevant images that just happen to possess similar features. Additionally, individual objects in images contain a wide variety of low-level features. Therefore, using only the low-level features will not satisfactorily describe what is to be retrieved.
Semantic Concepts. The user is typically looking for specific semantic concepts rather than specific low-level features. However, there is a disparity between “semantic concepts” and “low-level image features.” This disparity limits the performance of CBIR systems. Semantic concepts include meaningful content of an image—for example, a river, a person, a car, a boat, etc. Although objectively measurable, low-level image features lack specific meaning.
The mapping between semantic concepts and low-level features is still impractical with present computer vision and AI techniques. To improve this situation, more research efforts have been shifted to “relevance feedback” techniques recently.
Relevance-Feedback CBIR
A common type of a CBIR system is one that finds images that are similar to low-level features of an example image or example images. To weed out the irrelevant images returned in CBIR, some CBIR systems utilize user feedback to gain an understanding as to the relevancy of certain images. The user feedback is in the form of selected exemplary images (either positive or negative). These exemplary images may be called “feedback” images.
The user feedback selects the exemplary images used to narrow successive searches. A common approach to relevance feedback is estimating ideal query parameters using the low-level image features of the exemplary images. Thus, relevance feedback maps low-level features to human recognition of semantic concepts.
In a relevance-feedback CBIR system, a user submits a query and the system provides a set of query results. More specifically, after a query, the system presents a set of images to the human querier. The human designates specific images as positive or negative. Positive indicates that the image contains the semantic concepts queried and negative indicates that the image does not contain such concepts.
Based upon this feedback, the system performs a new query and displays a new set of resulting images. The human again provides feedback regarding the relevance of the displayed images. Another round of query and feedback is performed. Each round may be called an iteration. The process continues for a given number of iterations or until the user (or system) is satisfied with the overall relevance of the present set of images.
One of the most popular models used in information retrieval is the vector model. The vector model is described in such writings as Buckley and Salton, “Optimization of Relevance Feedback Weights,” in Proc of SIGIR'95; Salton and McGill, “Introduction to Modern Information Retrieval,” McGraw-Hill Book Company, 1983; and W. M. Shaw, “Term-Relevance Computation and Perfect Retrieval Performance,” Information processing and Management. Various effective retrieval techniques have been developed for this model and among them is the method of relevance feedback.
Most of the existing relevance feedback research can be classified into two approaches: query point movement and re-weighting.
Query-Point-Movement
The query-point-movement method essentially tries to improve the estimate of an “ideal query point” by moving it towards good example points and away from bad example points. The frequently used technique to iteratively improve this estimation is the Rocchio's equation given below for sets of relevant documents D′R and non-relevant documents D′N noted by the user:
                              Q          ′                =                              α            ⁢                                                  ⁢            Q                    +                      β            ⁡                          (                                                1                                      N                                          R                      ′                                                                      ⁢                                                      ∑                                          i                      ∈                                              D                        R                        ′                                                                              ⁢                                      D                    i                                                              )                                -                      γ            ⁡                          (                                                1                                      N                                          N                      ′                                                                      ⁢                                                      ∑                                          i                      ∈                                              D                        N                        ′                                                                              ⁢                                      D                    i                                                              )                                                          (        1        )            where α, β, and γ are suitable constants and NR′and NN′are the number of documents in D′R and D′N respectively. In this equation, D′R are those images (i.e., documents) that the user found relevant and D′N are those images that the user did not find relevant.
The first portion (before the subtraction sign) of Equation 1 is a “reward function” that rewards query results that include the desired semantic content. The reward is based upon the positive feedback from the querier. The last portion (after the subtraction sign) of Equation 1 is a “penalty function” that penalizes query results that do not include the desired semantic content. The penalty is based upon the negative feedback from the querier.
This technique is employed, for example, by the MARS system, as described in Rui, Y., Huang, T. S., and Mehrotra, S. “Content-Based Image Retrieval with Relevance Feedback in MARS,” in Proc. IEEE Int. Conf. on Image proc., 1997.
Some existing implementations of point movement strategy use a Bayesian method. Specifically, these include Cox et al. (Cox, I. J., Miller, M. L., Minka, T. P., Papathornas, T. V., Yianilos, P. N. “The Bayesian Image Retrieval System, PicHunter: Theory, Implementation, and Psychophysical Experiments” IEEE Tran. On Image Processing, Volume 9, Issue 1, pp. 20–37, January 2000) and Vasconcelos and Lippman (Vasconcelos, N., and Lippman, A., “A Bayesian Framework for Content-Based Indexing and Retrieval”, In: Proc. of DCC'98, Snowbird, Utah, 1998) used Bayesian learning to incorporate user's feedback to update the probability distribution of all the images in the database.
In these conventional works, they consider the feedback examples to the same query to be independent with each other. They do this so that they can use Naive Bayesian Inference to optimize the retrieval results by using feedback examples.
These conventional works do not treat all positive examples to be closely related with each other. They do not use all these positive examples of the same query to construct a Bayesian classifier and use that classifier to represent the original query and try to get more accurate retrieval results. These works are not incremental.
Re-Weighting
With the re-weighting method, each image is represented by an N dimensional feature vector; thus, the image may be viewed as a point in an N dimensional space. Therefore, if the variance of the good examples is high along a principle axis j, the values on this axis are most likely not very relevant to the input query and a low weight wj can be assigned to the axis. Therefore, the inverse of the standard deviation of the jth feature values in the feature matrix is used as the basic idea to update the weight wj. The MARS system mentioned above implements a slight refinement to the re-weighting method called the standard deviation method.
To optimize the query for further image similarity assessment, conventional relevance-feedback systems use only weighted feature sum (WFS) of the feedback images. WFS is a conventional query-refinement technique. WFS requires many iterations (well more than three) to produce adequate results. WFS does not work very well in many cases, particularly when the user wants to express an “OR” relationship among the queries.
Multiple Iterations
Conventional relevance feedback techniques may require many iterations before the majority of these results include images with the desired semantic content. They require at least three iterations, but typically much more than three iterations, before generating results with the desired semantic content.
These conventional relevance feedback methods either have no strategy to progressively adjust their results or have bad performances on large datasets. With conventional relevance feedback methods, the positive and negative feedbacks are always treated as the same processes.