The exemplary embodiment relates to image characterization. It finds particular application in connection with an apparatus and method for generation of an image representation as a mixture of a set of reference images. Implementations of the apparatus and method include image retrieval, image categorization, and image clustering applications, but it is to be appreciated that they are not limited to such applications.
Various image processing applications, such as retrieval, categorization, clustering, image enhancement, and the like, are becoming increasingly important given the widespread use of digital images. For example, for some applications, it would be helpful to retrieve images of a particular type of object, such as cars, from a database of images. In another application, given an image, it would be useful to identify and retrieve similar images from an image database. In other applications, given a large group of images, it would be useful to cluster them into a set of classes, based on content similarity.
To enable such techniques to be performed automatically or semi-automatically, some mechanism for automated image characterization based on the content of the image is desirable. Since a digital image is essentially in the form of pixel values, e.g., colorant values, for each of typically millions of pixels, image characterization techniques typically rely on extracting features from the image based on small segments of the image, referred to as patches. Techniques have been developed for categorizing images which rely on training a classifier, or set of classifiers, with information extracted from a large number of training images. The training images are manually labeled with one or more of a set of predefined object categories, such as person, landscape, animal, building, and the like. The classifier learns how to characterize a new image based on its extracted features and the extracted features of the labeled images. Such techniques, however, are manually intensive in the training phase, often requiring the manual labeling of a large number of images for each class for which the classifier is to be trained. Additionally, adding a new category generally involves considerable retraining of the classifier.
In processes which rely on identifying similar images, images may be characterized using a high level representation that is generated from the extracted low level features. It is known to model images using parameterized models. A Gaussian model, for example, characterizes an image using a Gaussian distribution representative of low level image features and having a mean vector and covariance matrix parameters. Characterizing the image by a single Gaussian component provides for straightforward comparison of different images, for example by comparing the mean vectors and covariance matrices of the two image models. However, a distribution having a single Gaussian component contains limited descriptive content and may be insufficient to adequately describe images. In other approaches, a mixture model is employed to characterize an image. For example, a Gaussian mixture model (GMM) describes the low level features distribution for an image using a weighted combination of Gaussian components each having mean vector and covariance matrix parameters.
A GMM or other mixture model has advantages in that it provides a higher number of components by which to characterize the image. On the other hand, it becomes more difficult to assess the similarity between images. For example, two images that are in reality quite similar may be fitted with very different sets of mixture model parameters, due to sparseness of the feature vectors sets extracted from the images. In such a case, the computed distance between the mixture models for the two images will be large, and the images will erroneously be deemed to be quite different.
In addition to this robustness problem, the use of mixture models can make image comparison computationally intensive. For example, in some studies it has been estimated that a GMM having about 128 Gaussian components is desirable to characterize an image sufficiently. A comparison of two images would thus entail pairwise comparison of each set of 128 Gaussian components, leading to about 16,000 Gaussian comparison operations, making it computationally too expensive for many applications.
The exemplary embodiment provides an apparatus and method for generation of a representation of an image which is both robust and easy to use and which can be generated largely automatically.