1. Field of Art
The present invention generally relates to the field of digital image analysis, and more specifically, to methods of learning image transformations and similarity metrics used to perform identification of objects in digital images.
2. Background of the Invention
Object recognition systems determine whether a given image contains an object of a particular category, such as a human face, a license plate, or other known object. Object identification systems determine whether an object that has been recognized matches a previously recognized object—such as a particular human face, or a particular type of cancerous tumor—based on some measure of similarity between the images.
One type of object identification system directly determines the degree of similarity between objects by directly comparing their respective pixels. However, such pixel comparison is time-consuming, and thus is of limited utility when performing operations such as determining which of a large set of objects is most like a given object, where the number of comparisons grows quadratically with the number of objects in the set and thus many comparisons may need to be performed. Thus, a second type of object identification system (hereinafter, a “transformation-based object identification system”) instead identifies an object within a digital image using an image transformation and a similarity function. The image transformation takes as input the raw image pixels of the object and produces as output a standardized representation of the object (e.g., a vector of real numbers). The similarity function takes as input the standardized representations of a pair of objects and produces as output an indicator, e.g., a floating point number, quantifying the degree of similarity of the objects. The representations produced by the image transformation can be stored considerably more compactly than the original pixel representation of the object, and the similarity metric operating on these compact representations can in turn be computed more quickly than can direct pixel comparisons.
However, even with a transformation-based object identification system, the problem remains of formulating the image transformation and the similarity metric. Existing object identification systems require the image transformation, or the similarity metric, or both, to be manually specified by a system designer prior to image analysis. For example, some existing systems employ fixed filters, such as Gabor filters, as the image transformation, and use a dot product of the resulting vector representations as the similarity metric. However, such an approach requires a system designer to devise image transformations and similarity metrics that produce accurate identification results for a given image corpus. Given the large potential variations between different types of image corpuses containing objects to be identified, a particular image transformation and similarity metric that are effective for one corpus may not be effective for a different corpus. As a result, the system designer must make repeated attempts to determine the appropriate transformation and metric for a given corpus of images.
For example, one corpus may contain very standardized pictures of faces in a given pose and lighting condition (e.g., standard photos of employees, such as for identification badges), a second corpus may contain informal pictures of people in very different poses and lighting conditions (e.g., pictures from a family photo album), and a third may contain images of patients' bodies produced by medical devices. The first and second corpuses might both be used for identification of faces, but given the different conditions (such as lighting, pose, distance, etc.) a single image transformation and similarity metric would be unlikely to be equally effective for both. The third corpus might be used by a hospital radiology department for identification of cancerous tumors, and the image transformation and similarity metric used in the facial identification systems would be ineffective for such radiology images. However, manual experimentation to determine an image transformation and similarity metric effective for every existing corpus would be an unduly burdensome task.