Many applications require methods for comparing data objects based on their content. Comparing images is useful, for example, in scene break detection, in parsing in video, and in image retrieval. One method of indexing images for later comparison is manual-tagging of images with human-selected keywords. This method, however, is neither flexible enough to satisfy the growing community of imagery users, nor fast enough to compete with the rate of information gathering. Fully automated content-based solutions are preferable.
Content-based image retrieval, in which a user presents an image and the retrieval system returns the most similar images, illustrates the demand for automatically comparing objects based on their content.
Most image retrieval systems operate in two distinct phases: an image summary phase, followed by a summary comparison phase. In the image summary phase, every image in the database is summarized as a vector, utilizing a particular method, such as color histogramming. The vectors are computed once and stored for later retrieval. In the summary comparison phase, a user presents a query image, and a comparison measure is used to compare the query image's summary with other image summaries and retrieve some number of the most similar vectors (images).
Color histogramming is the most widely used image summary method, employed in systems such as IBM's QBIC and Virage's VIR Engine. A color histogram is a vector where each entry stores the number of pixels of a given color in the image. All images are scaled to contain the same number of pixels before histogramming, and the colors of the image are mapped into a discrete colorspace containing n colors. Typically, images are represented in the RGB colorspace, using a few of the most significant bits per color channel to discretize the space.
Color histograms are widely used for content-based image retrieval because they are simple and quick to compute, and despite their simplicity, exhibit attractive properties. Color histograms are tolerant of movement of objects in the image and of changes in camera viewpoint, and are robust against occlusion. This invariance is due to the fact that color histograms do not relate spatial information with the pixels of a given color.
Color histograms have proven effective for small databases, e.g. 60-1140 images, but limitations become rapidly apparent with larger databases, e.g. tens of thousands of images. Because a color histogram records only color information, images with similar color histograms can have dramatically different appearances. For example, an image of a man in a red golf shirt may have a similar color histogram to an image of red flowers. In a large database, it is common for unrelated images to have similar color histograms.
There have been some attempts to improve color histograms by incorporating spatial information. Among the methods has been an attempt to capture the spatial arrangement of the different colors in the image. The image is partitioned into rectangular regions using maximum entropy, where each is region is predominantly a single color. Maximum entropy is the state at which all values in the distribution occur with equal probability. The similarity between two images is the degree of overlap between regions of the same color. While this method gives better results than color histograms without spatial information, it requires substantial computation, particularly in the partitioning algorithm. Additionally, the partitioning algorithm is affected by changes in orientation and position of the objects in the image.
Another histogramming method divides the image into five partially overlapping regions and computes the first three moments of the color distributions in each image. The moments of a distribution are the sums of the integer powers of the values. The mean of a distribution is derived from the first moment (1.sup.st power), the variance and standard deviation from second moment (2.sup.nd power), etc. The moments are computed for each color channel in the HSV colorspace, where pixels close to the border of the image have less weight. The distance between two regions is a weighted sum of the differences in each of the three moments. The distance between two images is the sum of the distance between the center regions, plus (for each of the four side regions) the minimum distance of that region to the corresponding region in the other image, when rotated by 0, 90, 180 or 270 degrees. Because the regions overlap, this method is insensitive to small rotations and translations.
Another histogramming method captures the spatial correlation between colors. This approach is called color correlograms, and is related to the correlogram technique from spatial data analysis. A color correlogram for a given pair of colors (i,j) and a distance k contains the probability that a pixel with color i will be k pixels away from a pixel of color j. To reduce the storage requirements, they concentrate on autocorrelograms, where i=j.
Another histogramming method uses color histograms to recognize individual objects contained in an image by comparing the histogram of a query object with the histograms of the images in the database. For the best matches, a histogram backprojection is performed to segment the objects from their backgrounds. The histogram comparison using this method can be upset by a pixel in the background in two ways: (1) the pixel has the same color as one of the colors in the query object. (2) the number of pixels of that color in the object is less than the number of pixels of that color in the query object.
In sum, it remains desirable to have an efficient image retrieval system which allows for significant differences in the appearance of similar images, such as: rotation and translation of objects in the image; addition, occlusion and subtraction of objects in the image; and changes in camera viewpoint and magnification. It is also important that the image summary phase be efficient in order to handle large imagery collections.
It is an object of the present invention to provide a method and apparatus to perform efficient data object comparisons based on data object content.
It is an object of the present invention to provide a method and apparatus to perform efficient image comparisons based on image content.
It is another object of the present invention to provide a method and apparatus to perform image retrieval which allows for significant differences in the appearance of images having similar content.
It is another object of the present invention to provide a method and apparatus to perform image object retrieval, in which similar objects within images may be identified allowing for significant differences in the appearance of images having similar content.