The number and size of digital image collections are growing rapidly in both the consumer and commercial domains. Sharing, sale, and other distribution of digital images often introduce images into a collection or database that are substantially similar to images already contained within the database. These similar images consume valuable storage resources and complicate the management of the collection without adding significant value to the collection. For commercial databases, the presence of redundant images adversely affects the salability of these assets. There exists a need to control this redundancy within image databases and collections by identifying redundant images and optionally purging them from the collection.
In addition, the relative ease of capturing and reusing images makes it difficult for the owners of image assets to detect unauthorized reuse of their images. Such owners can embed visually imperceptible digital watermarks within their image assets. The unauthorized reuse of portions of an asset within another image may make the extraction of information from the watermarked portions difficult. Identification of the copied portions of an image can improve the likelihood of recovery of the watermark. Thus, there exists a need to aid in the detection of such unauthorized reuse of all or part of their assets.
The detection of duplicate or substantially similar images can be performed using whole-image analysis. Examples of this are the methods employed by Jain, et al in U.S. Pat. No. 5,911,139 and Doermann in “The detection of duplicates in document image databases,” Proceedings of the International Conference on Document Analysis and Recognition, 1997. Jain and Doermann extract depictive features from each image and use such features to control the insertion of images into an image database. These methods are applicable in the detection of copies that do not differ substantially from their original. They may fail to detect copies that reproduce only portions of the original image.
Fingerprint analysis also deals with the matching of images, but the goal is substantially different from finding duplicate images or image segments. In this domain, the question is not whether two images, or portions thereof, share a common pictorial origin, rather it is whether two distinct images have been made by the same finger. Indeed, typically it is known a priori that the two images are impressions having distinctly different origins.
Fingerprint analysis techniques must accommodate partial fingerprints, image distortion, and multiple orientations. In U.S. Pat. No. 6,094,507, Monden et al choose many points of interest within each fingerprint. They form pairs of similar points such that one point comes from each image. They then analyze the positional relationship between these pairs of points to build a hypothetical geometric transformation that best aligns these point pairs. Pairs that fail to conform to the transformation are discarded and the local features extracted near the remaining point pairs are used as a basis of comparison between the two fingerprints.
The search for correspondence points is made easier by a priori knowledge of the image domain. Fingerprints do not vary grossly in scale, and fingerprint matches are only attempted on images that contain a substantial portion of the fingerprint pattern. The final analysis of image similarity is based upon the local features surrounding each point of interest; no information is generated beyond the selected points.
Förstner discusses image matching in Computer and Robot Vision Volume II, R. Haralick and L. Shapiro, editors, Chapter 16.5, Addison Wesley, 1993. In this discussion, feature points in each image are selected and paired, and a spatial-mapping function is determined using a robust estimation method. As a final step, the results are evaluated over the accepted correspondence points.
Fingerprint matching and the methods discussed by Förstner address pairs of images in which points of interest are selected and paired based upon their local features without regard to any semantic significance they may have. Frequently, images contain a large number of points of interest in comparison with the number of objects in the images that have semantic significance. Because these algorithms do not rely on semantic objects in the images, they tend to require a larger amount of computational effort to robustly match two images than algorithms that rely on matching the smaller number of semantic objects. Furthermore, once the correspondence function has been determined, neither the fingerprint matching methods nor the methods discussed by Förstner attempt to delimit the region over which this relationship is valid. Thus, none of the methods discussed addresses the identification of images or portions of images that may have been copied or combined with other imagery to form a new image. Thus, a need exists to detect and identify portions of an image that may be a geometrically transformed portion of another image.
A geometrical transformation is a mapping function that establishes a spatial correspondence between points in an image and its transformed counterpart. Many different types of mapping functions are used in various areas of image processing. For example, image resizing can be thought of as a horizontal and vertical scaling of the original image to create an image with different pixel dimensions and apparent resolution. Image rotation is typically performed to compensate for slight rotational misalignment of the camera during image capture, or to store portrait (resp. landscape) images in landscape (resp. portrait) mode. Image translation is a horizontal and vertical translation of the image coordinates that is typically done to compensate for slight translational misalignment of the camera during image capture, or to center a region of interest in the image. Affine or perspective transformations are typically applied to an image in order to rectify edges or lines in the image that should be parallel, or predominantly directed in a certain direction. Bilinear, polynomial, local polynomial, and spline models are typically used to model more complicated phenomena in images, such as lens distortion or aberration. In some cases, the mapping function is not defined over the entire domain of the image, but rather only over a region of interest, such as in image cropping. In image cropping, a region of interest is chosen in the original image, and image data outside the region of interest is deleted. In many cases, geometrical transformations can be thought of as compositions of multiple mapping functions. For example, a typical imaging operation is to crop and resize an original image so that it is centered and “zoomed in” on an object of interest. For further general information on geometric transformations of images, see G. Wolberg, Digital Image Warping, IEEE Computer Society Press, 1990.
Images in a collection or database that are geometrically transformed versions of another image in the same collection or database add no information that is not already contained within the parent image. Thus, these geometrically transformed images may occupy significant storage space while adding no informational value.
An obvious automatic method for identifying an image that is a geometrically transformed version of another image would be to compare the contents of the two images at a variety of values of combinations of the parameters of the transformation. This solution is computationally intractable even in the simple case of image resizing because of the virtually infinite number of possible relative scale combinations for comparing the two images. Other types of geometric transformations require more degrees of freedom, so the computational requirement increases even further. Therefore, there exists a need in the art for a method to automatically and efficiently test two images to determine if portions of one are a geometrically transformed version of the other.
In addition, oftentimes images that are geometrically transformed are also subjected to minor color corrections. In these cases it may be desirable to detect the similar content relationship with tolerance to moderate color corrections.