1. Field of the Invention
This invention relates to evaluating the content of visual images, in particular, to determining similarity between visual images, and, most particularly, to the use of process-response statistical modeling of visual images in determining similarity between visual images. The invention also relates to making use of visual image content evaluation—and, in particular, image similarity determinations—in effecting interaction (e.g., indexing, grouping, summarizing, annotating, searching, keyframing) with a collection of visual images.
2. Related Art
Most image similarity methods can be roughly divided into two categories, although some current methods can blur the distinction between those categories. The first category consists of methods that compute some statistical profile of the visual images, then perform comparisons between statistical profiles. The second category consists of methods that locate features in the visual images and, perhaps, quantify the relationships between features in the visual images, then compare the two visual images, often by examining both the difference in the types of features present in the two visual images, as well as the difference in how the features are related (spatially or otherwise) in the two visual images.
One of the earliest and most commonly used statistical methods is the color histogram, as described in, for example, “Color indexing,” by M. Swain and D. Ballard, International Journal of Computer Vision, 7(1):11-32, 1991, the disclosure of which is hereby incorporated by reference herein. This method quantizes the colors in a visual image, in some color space, and determines how frequently colors occur by computing a histogram that describes the distribution. Two visual images are then compared through comparison of their color distributions, i.e., color histograms. The main problem with this approach is that the spatial relationship between colors is not captured, although a great advantage is invariance to affine transforms. Some attempts have been made to incorporate some spatial information into the decision-making process. Examples of such attempts are described in the following documents, the disclosure of each of which is hereby incorporated by reference herein: 1) “Histogram refinement for content-based image retrieval,” by G. Pass and R. Zabih, IEEE Workshop on Applications of Computer Vision, pages 96-120, 1996; 2) “Color indexing with weak spatial constraints,” by M. Stricker and A. Dimai, SPIE Proceedings, 2670:29-40, 1996; and 3) “Visualseek: a fully automated content-based image query system,” by J. R. Smith and S. F. Chang, In Proc. of ACM Multimedia 96, 1996.
A method that aims to improve upon the color histogram is known as the color correlogram, described in “Image indexing using color correlograms,” by J. Huang, S. R. Kumar, M. Mitra, W.-J. Zhu and R. Zabih, In Proc CVPR '97, 1997, the disclosure of which is hereby incorporated by reference herein. This method constructs a histogram-like structure that gives the probability distribution that a particular color has a pixel of another color a certain distance away. The full color correlogram can be especially large, O(N2D) in size, where N is the number of colors after quantization and D is the range of distances. The auto-correlogram, which only measures the probability that the same color pixel is a certain distance away for each color, is O(ND) in size, but, though more reasonable in size, is less effective. Other extensions to the color correlogram attempt to incorporate edge information, as described in, for example, “Spatial color indexing and applications,” by J. Huang, S. R. Kumar, M. Mitra and W.-J. Zhu, In ICCV '98, Bombay, India, 1998, the disclosure of which is hereby incorporated by reference herein.
Another statistical method is the edge orientation histogram, as described in, for example, “Images Similarity Detection Based on Directional Gradient Angular Histogram,” by J. Peng, B. Yu and D. Wang, Proc. 16th Int. Conf. on Pattern Recognition (ICPR '02), and “Image Retrieval using Color and Shape,” A. K. Jain and A. Vailaya, Patt Recogn, 29(8), 1996, the disclosure of each of which is hereby incorporated by reference herein. This method constructs a histogram that describes the probability of a pixel having a particular gradient orientation. The advantage of using orientation only is that statistics about the general shape tendencies in the visual image are captured, without being too sensitive to image brightness or color composition. Although it is generally good to be insensitive to brightness, it can be a disadvantage at times to completely ignore color.
Another statistical method involves computing feature vectors at several locations in the visual image, where the locations can be discovered through a simple salient region (i.e., regions of a visual image that tend to capture a viewer's attention) detection scheme, as described in, for example, “Local Appearance-Based Models using High-Order Statistics of Image Features,” by B. Moghaddam, D. Guillamet and J. Vitria, In Proc. CVPR '03, 2003, the disclosure of which is hereby incorporated by reference herein. The features are not placed in histograms, but, rather, into a joint probability distribution which is used as a prior for object detection. The authors allude to computing feature vectors for visual images subdivided into blocks, but do not explore the idea nor suggest the use of a histogramming method. Another similar method is mentioned in “Probabilistic Modeling of Local Appearance and Spatial Relationships for Object recognition,” by H. Schneiderman and T. Kanade, In Proc. CVPR '98, 1998, the disclosure of which is hereby incorporated by reference herein. The fundamental idea of these methods is to represent low-level features in a probability distribution. The goals of these methods differ from those of the present invention in that the present invention is designed for determining image similarity while these methods are intended for specific object recognition purposes.
As indicated above, other methods attempt to find features in the visual images and describe the features in such a way that the features can be compared between visual images. Many of these methods also describe the relationships (spatial or otherwise) among the features and make use of that information as well in identifying similarities between visual images.
Several methods use image segmentation or color clustering to determine prominent color regions in the visual image. Examples of such methods are described in the following documents, the disclosure of each of which is hereby incorporated by reference herein: 1) “Image indexing and retrieval based on human perceptual color clustering,” by Y. Gong, G. Proietti and C. Faloutsos, In Proc. CVPR '98, 1998; 2) “A multiresolution color clustering approach to image indexing and retrieval,” by X. Wan and C. J. Kuo, In Proc. IEEE Int. Conf. Acoustics, Speech, Signals Processing, vol. 6, 1998; 3) “Integrating Color, Texture, and Geometry for Image Retrieval,” by N. Howe and D. Huttenlocher, In Proc. CVPR 2000, 2000; 4) “Percentile Blobs for Image Similarity,” by N. Howe, IEEE Workshop on Content-Based Access of Image and Video Databases, 1998; 5) “Blobworld: A System for Region-Based Image Indexing and Retrieval,” by C. Carson, M. Thomas, S. Belongie, J. M. Hellerstein and J. Malik, Proc. Visual Information Systems, pp. 509-516, June 1999; and 6) “Simplicity: Semantics-sensitive integrated matching for picture libraries,” by J. Z. Wang, Jia Li and Gio Wiederhold, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2001. The general approach is to divide the visual image into salient regions, compute a set of descriptors for each of one or more regions (e.g., all regions), and use the region descriptors from one or more of the regions (e.g., the largest region(s) or the region(s) that are determined to be most distinguishable from other region(s) for which descriptors have been computed) to describe the visual images (e.g., using a feature vector). To reduce processing time, the comparison between visual images is typically done by comparing the feature vectors of the most prominent regions (determined in any of a variety of ways, e.g., by size or shape) in each visual image. Some of the features may be related to absolute or relative position in the visual image, allowing image geometry to play a role in aiding image similarity computations.
A last method is one described in “Object Class Recognition by Unsupervised Scale-Invariant Learning,” by R. Fergus, P. Perona and A. Zisserman, In Proc. CVPR '03, 2003, the disclosure of which is hereby incorporated by reference herein. This method learns scale-invariant features from a set of visual images including a particular object or objects that are provided as a training set, and in an unsupervised way it is often able to pick out features specific to the object(s) common to all visual images in the training set. In this way, visual images can be classified according to the objects they contain. This method attempts to match visual images in an unsupervised manner according to the objects they contain; however, the method requires the definition of object classes and a training pass. In contrast, in some aspects of the present invention the retrieval of similar visual images containing similar objects is effected using no training and a single input visual image.