The popularity of digital images is rapidly increasing due to improving digital imaging technologies and easy availability facilitated by the Internet. More and more digital images are becoming available every day.
Automatic image retrieval systems provide an efficient way for users to navigate through the growing numbers of available images. Traditional image retrieval systems allow users to retrieve images in one of two ways: (1) keyword-based image retrieval or (2) content-based image retrieval. Keyword-based image retrieval finds images by matching keywords from a user query to keywords that have been manually added to the images. One of the more popular collections of annotated images is “Corel Gallery”, an image database from Corel Corporation that includes upwards of 1 million annotated images.
One problem with keyword-based image retrieval systems is it can be difficult or impossible for a user to precisely describe the inherent complexity of certain images. As a result, retrieval accuracy can be severely limited because images that cannot be described or can only be described ambiguously will not be retrieved successfully. In addition, due to the enormous burden of manual annotation, there are few databases with annotated images, although this is changing.
Content-based image retrieval (CBIR) finds images that are similar to low-level image features of an example image, such as color histogram, texture, shape, and so forth. Although CBIR solves the problem of keyword-based image retrieval, it also has severe shortcomings. One drawback of CBIR is that searches may return entirely irrelevant images that just happen to possess similar features. Additionally, individual objects in images contain a wide variety of low-level features. Therefore, using only the low-level features will not satisfactorily describe what is to be retrieved.
To weed out the irrelevant images returned in CBIR, some CBIR-based image retrieval systems utilize user feedback to gain an understanding as to the relevancy of certain images. After an initial query, such systems estimate the user's ideal query by monitoring user-entered positive and negative responses to the images returned from the query. This approach reduces the need for a user to provide accurate initial queries.
One type of relevance feedback approach is to estimate ideal query parameters using only the low-level image features. This approach works well if the feature vectors can capture the essence of the query. For example, if the user is searching for an image with complex textures having a particular combination of colors, this query would be extremely difficult to describe but can be reasonably represented by a combination of color and texture features. Therefore, with a few positive and negative examples, the relevance feedback process is able to return reasonably accurate results. On the other hand, if the user is searching for a specific object that cannot be sufficiently represented by combinations of available feature vectors, these relevance feedback systems will not return many relevant results even with a large number of user feedbacks.
Some researchers have attempted to apply models used in text information retrieval to image retrieval. One of the most popular models used in text information retrieval is the vector model. The vector model is described in such writings as Buckley and Salton, “Optimization of Relevance Feedback Weights,” in Proc of SIGIR'95; Salton and McGill, “Introduction to Modem Information Retrieval,” McGraw-Hill Book Company, 1983; and W. M. Shaw, “Term-Relevance Computation and Perfect Retrieval Performance,” Information processing and Management. Various effective retrieval techniques have been developed for this model and many employ relevance feedback.
Most of the previous relevance feedback research can be classified into two approaches: query point movement and re-weighting. The query point movement method essentially tries to improve the estimate of an “ideal query point” by moving it towards good example points and away from bad example points. The frequently used technique to iteratively improve this estimation is the Rocchio's formula given below for sets of relevant documents D′R and non-relevant documents D′N noted by the user:
                              Q          ′                =                              α            ⁢                                                  ⁢            Q                    +                      β            (                                          1                                  N                                      R                    ′                                                              ⁢                                                ∑                                      i                    ∈                                          D                      R                      ′                                                                                                                              ⁢                                                                  ⁢                                  D                  i                                                      )                    -                      γ            (                                          1                                  N                                      N                    ′                                                              ⁢                                                ∑                                      i                    ∈                                          D                      N                      ′                                                                                                                              ⁢                                                                  ⁢                                  D                  i                                                      )                                              (        1        )            where α, β, and γ are suitable constants and NR′, and NN′ are the number of documents in D′R and D′N respectively. This technique is implemented, for example, in the MARS system, as described in Rui, Y., Huang, T. S., and Mehrotra, S. “Content-Based Image Retrieval with Relevance Feedback in MARS,” in Proc. IEEE Int. Conf. on Image proc., 1997.
The central idea behind the re-weighting method is very simple and intuitive. Since each image is represented by an N dimensional feature vector, the image may be viewed as a point in an N dimensional space. Therefore, if the variance of the good examples is high along a principle axis j, the values on this axis are most likely not very relevant to the input query and a low weight wj can be assigned to the axis. Therefore, the inverse of the standard deviation of the jth feature values in the feature matrix is used as the basic idea to update the weight wj. The MARS system mentioned above implements a slight refinement to the re-weighting method called the standard deviation method.
Recently, more computationally robust methods that perform global optimization have been proposed. One such proposal is the MindReader retrieval system described in Ishikawa, Y., Subramanya R., and Faloutsos, C., “Mindreader: Query Databases Through Multiple Examples,” In Proc. of the 24th VLDB Conference, (New York), 1998. It formulates a minimization problem on the parameter estimating process. Unlike traditional retrieval systems with a distance function that can be represented by ellipses aligned with the coordinate axis, the MindReader system proposed a distance function that is not necessarily aligned with the coordinate axis. Therefore, it allows for correlations between attributes in addition to different weights on each component.
A further improvement over this approach is described in Rui, Y., Huang, T. S. “A Novel Relevance Feedback Technique in Image Retrieval,” ACM Multimedia, 1999. Their CBIR system not only formulates the optimization problem but also takes into account the multi-level image model.
All the approaches described above perform relevance feedback at the low-level feature vector level in image retrieval, but fail to take into account any semantics for the images themselves. The inherent problem with these approaches is that adopting relevance feedback used in text information retrieval to image retrieval does not prove to be as successful as hoped. This is primarily because low-level features are often not as powerful in representing complete semantic content of images.
As a result, there have been efforts on incorporating semantics in relevance I feedback for image retrieval. In Lee, Ma, and Zhang, “Information Embedding Based on User's Relevance Feedback for Image Retrieval,” Technical Report HP Labs, 1998, the authors propose a framework that attempts to embed semantic information into a low-level feature-based image retrieval process using a correlation matrix. In this framework, semantic relevance between image clusters is learned from a user's feedback and used to improve the retrieval performance.
There remains, however, need for improvement in the image retrieval systems and methods that utilize relevance feedback. The inventors propose a system that integrates both semantics and low-level features into the relevance feedback process in a new way. Only when the semantic information is not available is the technique reduced to one of the previously described low-level feedback approaches as a special case.