Digital images are increasingly more common as scanners and digital cameras drop in price and increase in availability and function. As users such as digital photographers, artists, and so on, amass large collections of digital photographs on their computers, the challenges involved with querying and accessing digital images on local and networked computing systems increase. Thus, digital image users increasingly rely on conventional image retrieval technology to help query and access digital images from various data stores. Such image retrieval technology includes keyword-based image retrieval or content-based image retrieval.
Keyword-based image retrieval finds images by matching keywords from a user query to keywords that have been manually added to the images. Thus, these images have been manually annotated with keywords related to their semantic content. One of the more popular collections of annotated images is “Corel™ Gallery”, an image database from Corel Corporation that includes upwards of one million annotated images.
Unfortunately, with keyword-based image retrieval systems, it can be difficult or impossible for a user to precisely describe the inherent complexity of certain images. Additionally, image annotation is a subjective process—what may be important to one user may not be important to another. As a result, retrieval accuracy can be severely limited because some images—those that cannot be described or can only be described ambiguously—will not be retrieved successfully. In addition, due to the enormous burden of manual annotation, there are a limited number of databases with annotated images.
Although image retrieval techniques based on keywords can be easily automated, they suffer from the same problems as the information retrieval systems in text databases and web-based search engines. Because of wide spread synonymy and polysemy in natural language, the precision of such systems is very low and their recall is inadequate. (Synonymy is the quality of being synonymous; equivalence of meaning. Polysemy means having or characterized by many meanings). In addition, linguistic barriers and the lack of uniform textual descriptions for common image attributes severely limit the applicability of the keyword based systems.
Content-based image retrieval (CBIR) systems have been built to address many issues, such as those of keyword-based systems. These systems extract visual image features such as color, texture, and shape from the image collections and utilize them for retrieval purposes. These visual image features are also called “low-level” features. Examples of low-level features of an image include color histogram, wavelet based texture descriptors, directional histograms of edges, and so forth. CBIR systems work well when the extracted feature vectors accurately capture the essence of the image content.
For example, if a user is searching for an image with complex textures having a particular combination of colors, this type of query is extremely difficult to describe using keywords, but it can be reasonably represented by a combination of color and texture features. On the other hand, if a user is searching for an object that has clear semantic meanings but cannot be sufficiently represented by combinations of available feature vectors, the content-based systems will not return many relevant results. Furthermore, the inherent complexity of the images makes it almost impossible for users to present the system with a query that fully describes their intentions. Accordingly, although CBIR solves many of the problems of keyword-based image retrieval, conventional CBIR technology has a number of shortcomings.
One such shortcoming, for example, is that searches may return entirely irrelevant images that just happen to possess similar features. Individual objects in images contain a wide variety of low-level features. This increases the likelihood that completely irrelevant images will be returned in response to a query that is based on low-level features. Therefore, using only the low-level features of an image to describe the types of images that the user wishes to locate will not typically satisfactorily describe what a user desires to retrieve.
Another shortcoming, for example, is that users typically desire to locate images that are based on specific semantic concepts, rather than images that include certain low-level features. Semantic concepts include meaningful content of an image—for example, a river, a person, a car, a boat, etc. Although objectively measurable, low-level image features lack specific meaning. Additionally, mapping semantic concepts to low-level features is still impractical with present computer vision and Al techniques. Accordingly, the disparity between semantic content and low-level features that lack specific meaning substantially limits the performance of conventional CBIR systems.
To improve this situation, some CBIR systems utilize user feedback to gain an understanding as to the relevancy of certain images. The user feedback is in the form of selected exemplary images. These exemplary images may be called “feedback” images. A user feedback selects such exemplary images to narrow successive searches. A common approach to relevance feedback is estimating ideal query parameters using the low-level image features of the exemplary images. Thus, relevance feedback assists in mapping low-level features to human recognition of semantic concepts.
In a relevance-feedback CBIR system, a user submits a query and the system provides a set of query results. More specifically, after a query, the system presents a set of images to the user. The user designates specific images as positive or negative. Positive indicates that the image contains the semantic concepts queried and negative indicates that the image does not contain such concepts. Based upon this feedback, the system performs a new query and displays a new set of resulting images. This means that relevance feedback is dynamically used during the particular single search session to modify a search query vector or distance metric, or to update a probability distribution of images across a database.
Each round of query and feedback in a particular search session may be called an iteration of that particular search session. This query/feedback process continues for some number of iterations or until the user is either satisfied with the overall relevance of the present set of images, or decides to attempt a different search query. In this manner, image relevance feedback from the user may reveal semantic relationships between the retrieved images that are not easily captured by image low-level features.
Unfortunately, image relevance feedback is not typically accumulated or memorized across CBIR search sessions. Rather, such image relevance feedback is typically discarded and not utilized to improve future performance of the CBIR system. The following arrangements and procedures address these and other limitations of conventional CBIR techniques.