This disclosure relates to color feature vectors used in content-based image retrieval.
A content-based image retrieval (“CBIR”) system retrieves images from a database based on the content of an image. For example, some CBIR systems allow users to provide a representative query image. The system uses the query image to retrieve images from a database having similar content. Typically, image comparisons are based on image content, i.e., objective features of the image, such as, color composition, texture, or lighting. For example, a CBIR system may retrieve images having similar color composition, similar textures, similar lighting, or any other feature of image content.
There have been two major categories of CBIR applications: (i) image libraries and (ii) image recognition and categorization systems. Image libraries may be used, for example, to retrieve images for use in publications. A user may identify a query image and retrieve all images having similar colors. For example, an editor may have a picture of children playing in the surf at a beach. Desiring additional photography, the editor may query a CBIR image library to find additional beach pictures.
CBIR also may be used in image recognition or categorization systems. For example, a CBIR system has been used to classify images of tissue samples taken through a microscope. The system searches an image database to help diagnosis prostate cancer. Typically, an image recognition or categorization system extracts a representation of the features of an image and searches a database of images to find images having similar features. The image then may be categorized based on the search results.
Fixed-size color histograms have been used to represent the color information of images. In a fixed-size color histogram system, the colors of an image are mapped into a discrete color space containing n predetermined colors selected without regards to the image. A color histogram H is a vector (h1, h2, . . . hn) in an n-dimensional vector space, where each element hj represents the number of pixels of color j in the image. To compare two histograms H1 and H2, one can use conventional techniques such as the L1-norm as discussed in the following papers: Swain, M. J. and D. H. Ballard, “Color Indexing,” International Journal of Computer Vision, 7 (1), pp. 11–32 (1991); Funt, B. V and G. Finlayson, “Color Constant Color Indexing,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 17 (5), pp. 522–529 (May 1995); and Striker, M. and M. Orengo, “Similarity of Color Images,” Proceeding of SPIE Conference on Storage and Retrieval for Image and Video Databases III, San Jose, Calif., Vol. 2420, pp. 381–92 (1995).
Additionally, an L2-norm may be used to compare vectors H1 and H2, as discussed in the following papers: Niblack, W., R. Barber, W. Equitz, M. Flickner, et al., “The QBIC Project: Querying Images by Content Using Color, Texture, and Shape,” Proceeding of SPIE, Storage and Retrieval for Image and Video Databases, Vol. 1908, San Jose, Calif., pp. 173–187 (February 1993); and Hafner, James L., Harpreet S. Sawhney, William Equitz, Myron Flickner, Wayne Niblack, “Efficient Color Histogram Indexing for Quadratic Form Distance Functions,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 17 (7), pp. 729–736 (1995).
An alternative approach to the use of color histograms is the use of wavelet coefficients. This technique was pioneered by C. E. Jacobs as described in the following paper: Jacobs, C. E., A. Finkelstein, and D. H. Salesin, “Fast Multiresolution Image Querying,” Computer Graphics Proceedings, SIGGRAPH, pp. 278–80 (Los Angeles 1995). Jacobs uses coefficients quantized to −1, 1, and 0. Wavelet representations of image content, however, generally vary when an image is transformed by translations, rotations, or scale changes. Thus, a CBIR system using wavelet analysis techniques may have difficulties finding similarities between images that are simple rotations or reflections of one another.