1. Technical Field
This application involves retrieval of stored images with metadata.
2. Related Art
The wide availability of digital sensor technology together with the falling price of storage devices has spurred an exponential growth in the volume of image material being captured for a range of applications. Digital image collections are rapidly increasing in size and include basic home photos, image based catalogues, trade marks, fingerprints, mugshots, medical images, digital museums, and many art and scientific collections. It is not surprising that a great deal of research effort over the last five years has been directed at developing efficient methods for browsing, searching and retrieving images [1,2].
Content-based image retrieval requires that visual material be annotated in such a way that users can retrieve the images they want efficiently and effortlessly. Current systems rely heavily upon textual tagging and measures (eg colour histograms) that do not reflect the image semantics. This means that users must be very conversant with the image features being employed by the retrieval system in order to obtain sensible results and are forced to use potentially slow and unnatural interfaces when dealing with large image databases. Both these barriers not only prevent the user from exploring the image set with high recall and precision rates, but the process is slow and places a great burden on the user.
Early retrieval systems made use of textual annotation [3] but these approaches do not always suit retrieval from large databases because of the cost of the manual labour involved and the inconsistent descriptions, which by their nature are heavily dependent upon the individual subjective interpretation placed upon the material by the human annotator. To combat these problems techniques have been developed for image indexing that are based on their visual content rather than highly variable linguistic descriptions.
It is the job of an image retrieval system to produce images that a user wants. In response to a user's query the system must offer images that are similar in some user-defined sense. This goal is met by selecting features thought to be important in human visual perception and using them to measure relevance to the query. Colour, texture, local shape and layout in a variety of forms are the most widely used features in image retrieval [4,5,6,7,8,9,10]. One of the first commercial image search engines was QBIC [4] which executes user queries against a database of pre-extracted features. VisualSEEk [7] and SaFe [11] determine similarity by measuring image regions using both colour parameters and spatial relationships and obtain better performance than histogramming methods that use colour information alone. NeTra [8] also relies upon image segmentation to carry out region-based searches that allow the user to select example regions and lay emphasis on image attributes to focus the search. Region-based querying is also favoured in Blobworld [6] where global histograms are shown to perform comparatively poorly on images containing distinctive objects. Similar conclusions were obtained in comparisons with the SIMPLIcity system [30]. The Photobook system [5] endeavors to use compressed representations that preserve essential similarities and are “perceptually complete”. Methods for measuring appearance, shape and texture are presented for image database search, but the authors point out that multiple labels can be justifiably assigned to overlapping image regions using varied notions of similarity.
Analytical segmentation techniques are sometimes seen as a way of decomposing images into regions of interest and semantically useful structures [21-23,45]. However, object segmentation for broad domains of general images is difficult, and a weaker form of segmentation that identifies salient point sets may be more fruitful [1].
Relevance feedback is often proposed as a technique for overcoming many of the problems faced by fully automatic systems by allowing the user to interact with the computer to improve retrieval performance [31,43]. In Quicklook [41] and ImageRover [42] items identified by the user as relevant are used to adjust the weights assigned to the similarity function to obtain better search performance. More information is provided to the systems by the users who have to make decisions in terms specified by the machine. MetaSeek maintains a performance database of four different online image search engines and directs new queries to the best performing engine for that task [40]. PicHunter [12] has implemented a probabilistic relevance feedback mechanism that predicts the target image based upon the content of the images already selected by the user during the search. This reduces the burden on unskilled users to set quantitative pictorial search parameters or to select images that come closest to meeting their goals. Most notably the combined use of hidden semantic links between images improved the system performance for target image searching. However, the relevance feedback approach requires the user to reformulate his visual interests in ways that he frequently does not understand.
Region-based approaches are being pursued with some success using a range of techniques. The SIMPLIcity system [30] defines an integrated region matching process which weights regions with ‘significance credit’ in accordance with an estimate of their importance to the matching process. This estimate is related to the size of the region being matched and whether it is located in the centre of the image and will tend to emphasise neighbourhoods that satisfy these criteria. Good image discrimination is obtained with features derived from salient colour boundaries using multimodal neighbourhood signatures [13 -15,36]. Measures of colour coherence [16,29] within small neighbourhoods are employed to incorporate some spatial information when comparing images. These methods are being deployed in the 5th Framework project ARTISTE [17, 18, 20] aimed at automating the indexing and retrieval of the multimedia assets of European museums and Galleries. The MAVIS-2 project [19] uses quad trees and a simple grid to obtain spatial matching between image regions.
Much of the work in this field is guided by the need to implement perceptually based systems that emulate human vision and make the same similarity judgements as people. Texture and colour features together with rules for their use have been defined on the basis of subjective testing and applied to retrieval problems [24]. At the same time research into computational perception is being applied to problems in image search [25,26]. Models of human visual attention are used to generate image saliency maps that identify important or anomalous objects in visual scenes [25,44]. Strategies for directing attention using fixed colour and corner measurements are devised to speed the search for target images [26]. Although these methods achieve a great deal of success on many types of image the pre-defined feature measures and rules for applying them will preclude good search solutions in the general case.
The tracking of eye movements has been employed as a pointer and a replacement for a mouse [48], to vary the screen scrolling speed [47] and to assist disabled users [46]. However, this work has concentrated upon replacing and extending existing computer interface mechanisms rather than creating a new form of interaction. Indeed the imprecise nature of saccades and fixation points has prevented these approaches from yielding benefits over conventional human interfaces.
Notions of pre-attentive vision [25,32-34] and visual similarity are very closely related. Both aspects of human vision are relevant to content-based image retrieval; attention mechanisms tell us what is eye-catching and important within an image, and visual similarity tells us what parts of an image match a different image.
A more recent development has yielded a powerful similarity measure [35]. In this case the structure of a region in one image is being compared with random parts in a second image while seeking a match. This time if a match is found the score is increased, and a series of randomly generated features are applied to the same location in the second image that obtained the first match. A high scoring region in the second image is only reused while it continues to yield matches from randomly generated features and increases the similarity score. The conjecture that a region in the second image that shares a large number of different features with a region in the first image is perceptually similar is reasonable and appears to be the case in practice [35]. The measure has been tested on trademark images and fingerprints and within certain limits shown to be tolerant of translation, rotation, scale change, blur, additive noise and distortion. This approach does not make use of a pre-defined distance metric plus feature space in which feature values are extracted from a query image and used to match those from database images, but instead generates features on a trial and error basis during the calculation of the similarity measure. This has the significant advantage that features that determine similarity can match whatever image property is important in a particular region whether it be a shape, a texture, a colour or a combination of all three. It means that effort is expended searching for the best feature for the region rather than expecting that a fixed feature set will perform optimally over the whole area of an image and over every image in the database. There are no necessary constraints on the pixel configurations used as features apart from the colour space and the size of the regions which is dependent in turn upon the definition of the original images.
More formally, in this method (full details of which are given in our International patent application WO 03/081523), a first image (or other pattern) is represented by a first ordered set of elements A each having a value and a second pattern is represented by a second such set. A comparison of the two involves performing, for each of a plurality of elements x of the first ordered set the steps of selecting from the first ordered set a plurality of elements x′ in the vicinity of the element x under consideration, selecting an element y of the second ordered set and comparing the elements x′ of the first ordered set with elements y of the second ordered set (each of which has the same position relative to the selected element y′ of the second ordered set as a respective one x of the selected plurality of elements of the first ordered set has relative to the element x under consideration). The comparison itself comprises comparing the value of each of the selected plurality of elements x′ of the first set with the value of the correspondingly positioned element y′ of the like plurality of elements of the second set in accordance with a predetermined match criterion to produce a decision that the plurality of elements of the first ordered set matches the plurality of elements of the second ordered set. The comparison is them repeated with a fresh selection of the plurality of elements x′ of the first set and/or a fresh selection of an element y of the second ordered set generating a similarity measure V as a function of the number of matches. Preferably, following a comparison resulting in a match decision, the next comparison is performed with a fresh selection of the plurality of elements x′ of the first set and the same selection of an element y of the second set.