1. Field of the Invention
The present invention generally relates to computer systems for image processing. More particularly, the present invention relates to apparatuses and methods to automatically search image data to detect the presence, or likelihood of presence, of specific target images.
2. Description of the Related Art
There are several systems that analyze data to detect the presence of specific images in data. Analog images can occur in a variety of media including photographs, photographic slides, television images in various formats including HDTV, images on computer monitors, holograms, x-rays, sonograms and radar. The analog imagery can be digitized and thus represented by an array of pixels where each pixel may be determined by one or more spectral bands of digital data. As examples, a digital image can be a 1024xc3x971024 array of pixels where each pixel is specified by a 0 or a 1 denoting black or white respectively, or an integer between 0 and 255 denoting 256 shades of gray, or 3 integers between 0 and 255 denoting a red, blue and green component, respectively with 256 levels for each component, or an integer between 0 and 1023 denoting 1024 infra-red levels, or 256 integers each between 0 and 255 and each denoting the output at a different spectral level from a multi-spectral satellite sensor. Digital imagery can also be created directly through computer graphics software and systems as well as through imaging sensors whose output is natively digital.
There are many existing applications that utilize some form of image recognition within a set of data, such as fingerprint identification, quality control for manufacturing, optical character recognition on scanned documents, automatic target recognition for weapons systems, and face recognition. A critical function of these systems is to indicate the likelihood of a match when a target image is indicated by the image data. And in circumstances where the image data is voluminous, such as satellite data or all images available on the Internet, human review for all of the data is impractical.
The prior art image recognition methods generally fall into two categories: constrained and unconstrained. Constrained methods of image recognition work only with images that have a specific structure. For example, one commercial application that performs face recognition based on biometrics requires the calculation of distances between eyes, and between nose and mouth, and may rely on other like relational computations. Such a face recognition calculation normally does not make sense on a general type of image, such as a boat or airplane, since the underlying structure of an image of boat or plane is not the same as that of a face. More generally, constrained image recognition methods are typically not effective on imagery without the specific structure assumed by the method since the sought image may have very different relational aspects than the target image, yet still satisfy the search criteria.
Constrained methods, however, typically require relatively high quality imagery such that the underlying structure of the images that are analyzed by the recognition method can be observed and this would not be suitable for use in poor quality image data. For example, a drug enforcement agent may take a photograph of an individual in a moving car and obtain a blurry image, or may take a telephoto picture of a subject and later want to identify other people in the background of the photo who may be severely out of focus. Another problem occurs in video imagery of a crowd scene, such as at a football stadium, where very tiny images of large numbers of people are present. Once the imagery is too small, or too out of focus, or too blurred, the calculations required in the constrained biometric face recognition system cannot be performed accurately. More generally, once the quality of imagery is too poor, any constrained system cannot take advantage of its underlying structural assumptions and then fails to perform accurately.
On the other hand, unconstrained image recognition systems do not utilize the underlying structure of an image as a basis for comparison, and thus do not suffer the same problems in reviewing low quality image data. Prior art unconstrained image recognition systems however do not work well with complex imagery. For example, given two images A and B, specified by square pixel arrays of data, where each pixel is specified by an integer between 0 and 255, one measure of the difference, ∥Axe2x88x92B∥, between A and B is the square root of the sum of the square of difference between the corresponding pixel values. In such manner, A and B would be said to be similar when ∥Axe2x88x92B∥ is small. However, this measurement may be very large, even when the difference between A and B is barely noticeable For example, create an image B from an image A where all columns of pixels of image A are shifted to the right by one pixel, and a new column of pixels is added at the left which just duplicates the new second column. If A is a 1024xc3x971024 image, A and B will look very similar to a human observer. For example, if A is a picture of a boat, then B will also look like a boat, slightly shifted to the right. However, ∥Axe2x88x92B∥ may be large since every pixel in A will be different from the corresponding pixel in B. In some practical image recognition applications, it is desirable to recognize whether one or more objects in image A are also present, or absent, from image B, not only when they are slightly modified digitally as in the above example, but when they are acquired under very different conditions. Examples of such disparate acquisition conditions include different image acquisition times leading to different conditions of use, with different cameras or other sensors, from different distances from sensor to object(s) or scenes, at different perspectives, under different lighting conditions, different environmental conditions, and in the context of different backgrounds and in the presence or absence of other objects. Moreover, the target object(s) of interest may be partially obscured in a different manner in A and B, and may be rotated, scaled and translated relative to each other or to the background. Additionally, imaging systems and computer software may further distort the imagery, such as through the application of compression to facilitate storage and transmission and such as through insertion of special effects. Prior art unconstrained image recognition methods do not deal effectively with the complexity arising from significantly different conditions of image data acquisition.
Accordingly, it is desirable to have an improved system for image recognition that can adequately search realistic imagery data for the presence of target image(s) and/or image object(s) and successfully indicate the likelihood of target image(s) and/or image object(s) being present, even when the searchable imagery data and target imagery have significantly different conditions of acquisition. Such system should be unconstrained by the target image or searchable image data structure and allow for variation in the appearance of the target image within image data. It is to the provision of such an apparatus and method for recognizing images within digitized data that the present invention is primarily directed.
The present invention is an apparatus, method, and computer program that can recognize specific images within a collection of digitized image data, or at least indicate the likelihood that a specific image is contained within the image data. In the system, a processor can either receive image data in a digital format or itself digitize data into target images, the collection of which forms the searchable image data. In the system, a processor also can either receive other image data in a digital format, or itself digitize data into query images. The system then generates a set of domain blocks from one or more query images with each domain block representing a discrete portion the query image data. A set of range blocks is then generated from the query image(s) and a predetermined one or more target images that are desired located within the searchable image data, with the range blocks corresponding to discrete portions of the one or more queries and the target images from the searchable image data. To get additional potential appearances of the images, the range blocks are transformed by one or more substantially affine transformations with predetermined coefficients, such as an affine transformation which is one composed of translation, scaling and rotation operations in the spatial and spectral data. A substantially affine transformation is one which can be continuously approximated locally by affine transformations.
The system uses a predetermined method of comparing image regions consisting of predetermined configurations of pixels, such as the square root of the sum of the squares of the corresponding pixel values when the image regions consist of identically sized rectangles. Each domain block is then compared with one or more of the range blocks, and while comparing, generating classification data based upon a comparison of the domain block with such range blocks, the classification data including the comparison result, geometric information relating to the locations and descriptions of the range blocks, specifically including whether the range block originated from the query image or from the collection of searchable imagery, and the description of the substantially affine transformation, if any, which was applied to create the range block data. A determination of the likelihood of at least a specific portion of one or more query images being similar to specific portions of the searchable image data can be made based upon the classification data aggregated over the collection of domain blocks, using at least a measure of the extent to which domain blocks compare less closely to range blocks chosen from the query image(s) than to range blocks chosen from specific portions of the searchable image data.
The present invention attempts to accurately classify images that are likely to contain similar target images rather than specifically seek an exact match. In classification, the goal is not to exactly identify a target image but rather to categorize the image data (or the specific portions of the image data such as domain blocks) as to their likelihood of containing the target image. The present invention accordingly applies to classification as well as identification.
The one or more target images can be selected from the searchable image data itself, and in such manner, other like objects can be located within the object data. Further, the image data can be preprocessed in a predetermined manner after receipt thereof, such as through substantially affine transformations, scaling the image data to a pre-determined size, segmenting the image data, purposely blurring or altering the image data, or marking certain image areas to be ignored during the comparison of each domain block with one or more range blocks. And the steps of image recognition can be iterated to further review either specific or all of the image data based upon the classification data.
The substantially affine transformations create different views and appearances in the searchable image data or in the query imagery or in both, allowing the system to indicate a high likelihood of similarity even when the query imagery is acquired under different conditions from the searchable image data. Additionally, the system can indicate a high likelihood of similarity even when the query imagery represents partial views of objects in the searchable image data, or partial views of the searchable image data, since the comparison data can be high when those partial views are represented in the searchable image data.
The classification data can be chosen so that maximum similarity is indicated only when each domain block of the target image is a substantially affine transformation of at least one range block. In many cases of real image databases, this theoretical condition of maximum similarity is enough to imply that maximum similarity is indicated only when the two images are identical. And in one embodiment, the likelihood of similarity is determined by using a function of two variables with values between 0 and 1 and the first variable is a specific portion of a query image and the second variable is a specific portion of the image data.
A correlation of image data among different target images and/or query images can also be utilized along with the classification data to increase the accuracy of the invention. For example, one or more of the target images can be very similar images of the same object such as successive frames of video. In such a case, the extent to which the classification data of such similar images is itself correlated may be included in the classification data of one or more of the similar images. By aggregating classification data from different target images, the nature of target images can be extended to include sets of target images, such as video sequences, or other sets of images related in some manner. Similarly, the nature of query images can be extended to include sets of query images and the classification can be extended to included classification data of such extended sets of target and query images. The invention can also be utilized whether the target and query images are individual images or more general collections of correlated image data. This correlation aspect of the invention can be applied to instances of searchable image data even when there is no a priori knowledge of the similarity of images within such data through selecting some or all such data to be query images and using the classification data thereby generated to determine the similarity of images within such data, and thereafter utilize the approach of correlation. The correlation of image data within one or more target and/or query images can also be utilized along with the classification data to increase the accuracy of the invention and reduce the overhead and cost of utilizing the invention.
The present invention therefore provides an improved unconstrained system of image recognition that searches realistic, imperfect imagery data for the presence of target images and successfully indicates at least the likelihood of one or more similar target images being present. Through the use of substantially affine transformation, segmentation, and other data manipulation, variations of target images can be located within the image data even though the target image has a significantly different visual relationship or appearance within the image data.
Other objects, features, and advantages of the present invention will become apparent after review of the hereinafter set forth Brief Description of the Drawings, Detailed Description of the Invention, and the Claims.