This invention relates generally to image processing, and more particularly to indexing, searching, and retrieving images by queries on image content.
With the proliferation of multi-media, the world-wide web, and digital imaging, there now exists a demand for image management tools, most importantly tools for indexing, searching and retrieving images. This is commonly referred to as xe2x80x9cquery-by-image-contentxe2x80x9d (QBIC) or xe2x80x9ccontent-based image retrievalxe2x80x9d (CBIR). Existing systems often make use of global attributes such as overall color and texture distributions which ignore the actual composition of the image in terms of internal structures.
Most of the current content-based image retrieval systems rely on global image characteristics such as color and texture histograms, e.g., see Altavista""s xe2x80x9cPhotofinder.xe2x80x9d While these simple global descriptors are fast and often do succeed in partially capturing the essence of the user""s query, global descriptors often fail due to the lack of higher-level knowledge about what exactly was of interest to the user in the query image, i.e., user-defined content. Recently, there has been a gradual shift towards spatially-encoded image representations. Spatially-encoded representations range widely from fixed image partitioning, as in the xe2x80x9cImageRover,xe2x80x9d to highly local characterizations like the xe2x80x9ccolor correlograms,xe2x80x9d please see Sclaroff et al. in xe2x80x9cImagerover: A content-based image browser for the world wide web,xe2x80x9d Proc. IEEE Workshop on Content-Based Access of Image and Video Libraries, June 1997, and Huang et al. in xe2x80x9cImage indexing using color correlograms, xe2x80x9cProc. IEEE Conf. on Computer Vision and Pattern Recognition, 1997.
Somewhere in between these two extremes, one can find various techniques which deal with xe2x80x9cregionsxe2x80x9d or xe2x80x9cblobsxe2x80x9d in the images. For example, xe2x80x9cconfigural templatesxe2x80x9d specify a class of images, e.g., snow-capped mountain scenes, by means of photometric and geometric constraints. on pre-defined image regions, see Lipson et al. in xe2x80x9cConfiguration based scene classification and image indexing,xe2x80x9d Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 1997. Other techniques use automatic blob segmentation and description, see Carson et al. in xe2x80x9cRegion-based image querying,xe2x80x9d Proc. IEEE Workshop on Content-Based Access of Image and Video Libraries, June 1997, and Howe in xe2x80x9cPercentile blobs for image similarity,xe2x80x9d Proc. IEEE Workshop on Content-Based Access of Image and Video Libraries, June 1998. Most of these systems require pre-segmented regions.
The invention provides a method for representing an image for the purpose of indexing, searching, and retrieving images in a multi-media database. The method allows a user to specify xe2x80x9ccontentxe2x80x9d to be searched as salient xe2x80x9cregions-of-interestxe2x80x9d or ROIs. The user also specifies the importance of the spatial relationships of ROIs. The method yields acceptable retrievals that are at least equal to global-based searches, and provides an intuitive interface that is more in tune with the user""s notion of xe2x80x9ccontent,xe2x80x9d thus providing a more powerful image retrieval tool.
More particularly, the invention provides a method for representing an image in an image retrieval database. The method first separates and filters the image to extract texture features. The colors are readily obtained from the pixel values themselves. The color and texture features are partitioned into a plurality of blocks, each block is 16xc3x9716 pixels. A joint distribution of the color features and a joint distribution of the texture features are estimated for each block. The estimated joint distributions, expressed as histograms, are stored in the database with the image.
In one aspect, the color features include three color coordinates LUV, and the texture features include three edge measurements: edge magnitude, a rotation invariant Laplacian, and an edge orientation. The edge magnitude is log(G2x+G2y), the rotation invariant Laplacian is Gxx+Gyy, and the edge orientation is arg(Gx, Gy), where Gx, Gy and Gxx, Gyy are respectively the first and second derivatives of a Gaussian filter with specified scale "sgr" with two parameters used for the scale "sgr". In a query image, one or more regions of interest can be specified with a user interface, and joint distributions can be estimated for blocks corresponding to the regions of interest at as above. The query joint distributions can be used to index the database to find matching images. The matching images can be rank ordered. During the matching, the spatial relationship of the regions of interest can be considered in the matching.