1. Field of the Invention
The present invention relates to systems and methods for image retrieval and in particular to systems and methods for perception-based image retrieval.
2. Description of the Related Art
(Note: This application references a number of different publications as indicated throughout the specification by reference numbers enclosed in brackets, e.g., [x]. A list of these different publications ordered according to these reference numbers can be found below in Section 7 of the Detailed Description of the Preferred Embodiment. Each of these publications is incorporated by reference herein.)
Much research has been conducted on content-based image retrieval (CBIR) in the past decade. A content-based image retrieval system returns images that are “similar” to the query image. To measure similarity, most image database systems characterize images using perceptual features (e.g., color, shape and texture) and define how similarity can be quantified using these features.
Not all attempts, however, have been made to characterize images and to quantify similarity based on the characteristics of the human visual process in the CBIR community. For instance, while most traditional systems model the human response to “brightness” as a linear function, human eyes respond to brightness in a non-linear fashion. In addition, most traditional systems treat color as a continuous spectrum of wavelength, while people give simple names to only a limited number of colors (red, green, etc). Also, most systems treat all pixels on an image equally, but human vision tends to pay less attention to the border pixels, and it distributes effort unevenly by paying closer attention to ambiguous regions. Many other discrepancies exist.
The human visual system adjusts to the environment and adapts to the visual goals [14]. For instance, a person may not be able to tell if a figure is a real person or a statue at first glance. If the person pays closer attention to the figure's surface, the person may be able to identify it as a statue or person.
Thus, an image search engine must also be adaptive to the goals of a search task. The human visual system can be thought of as being divided into two parts: the eyes (the front-end) perceive images and the brain (the back-end that is equipped with a knowledge database and an inference engine) recognizes images. The front-end collects visual data for the back-end to allow high-level processing. The back-end instructs the front-end to zoom, pan, and collect visual data with different filters. (A filter can be regarded as a particular way of perceiving an image.) The front-end responds flexibly in perceiving visual data by selecting, ordering and weighting visual filters differently. The front-end and back-end may interact many times to complete a visual task.
What are needed are improved front-end designs. Specifically, there is a need for a common “front-end” to serve different back-ends for supporting different applications.