The present invention relates to a method and apparatus for computing the similarity between two images. The invention also relates to a computer readable medium comprising a computer program for computing the similarity between two images.
As network connectivity has continued its explosive growth and digital storage and capture devices have become smaller, faster, and less expensive, the amount of on-line digital content has increased rapidly. It is now very difficult to provide access to all this digital information using traditional database retrieval technology based on manually associating textual descriptions with digital image or video contents. For this reason, there has been a considerable motivation to be able to extract information about the digital content and make this information available to people wanting to retrieve particular images or video according to their content. This field of interest is generally referred to as content-based retrieval.
Automated content-based retrieval can be achieved by either the automated analysis of the digital content resulting in keywords, or textual annotations, which may be used as the basis for search and retrieval, or by providing some level of search and retrieval based on similarity to a particular example of the digital content. This invention is related to the latter method of automated content-based retrieval of digital images and video.
Content-based retrieval of images (and hence frames of video) by similarity is typically based on the automatic extraction from the digital signal of low-level color, texture and shape features and computing the similarity between two sets of thus computed features using a predetermined similarity metric.
For example, the color of a digital image can be represented by a low-level feature consisting of a color histogram. This histogram can be based on the entire image or based on selected areas of the image in either the spatial or frequency domains. The similarity between the color histograms extracted from two images can be computed using an Lp distance metric, such as the Euclidean distance. This similarity metric can be based on the individual components of the histograms or on other measures derived from the histogram (e.g., the mean, variance and skewness of the histograms).
Similarly, texture and shape features can also be automatically extracted from digital images. Typical texture features comprise second-order statistics (i.e., variances), wavelet-based methods and the Tamura algorithm. Methods for representing the shape of objects in an image comprise moment-based features, spline functions that represent boundaries, and Fourier descriptor methods. For each low-level feature there are generally many ways of computing the similarity between the features automatically extracted from two images.
Generally, in an application providing content-based retrieval of images by similarity, the user is presented with several example images. He/she then selects the example that is most like the image that is required. The application then searches a database of images, which typically also contains sets of automatically calculated features for each image, and retrieves a set of images that is most similar to the image selected by the user. The user night also select the similarity measure (e.g., similarity based on color) that they wish the application to use when selecting similar images. This method can also be extended to retrieve frames from digital video content.
The success of existing content-based retrieval systems has been limited by the fact that many users wish to be able to retrieve digital imagery based on higher-level knowledge of the content. For example, rather than retrieving a set of images that have a similar overall color they might wish to retrieve images which have a mountainous terrain with a body of water and were taken on a sunny day. Another problem of retrieval based on similarity between low-level features is that of the user understanding what the automatically extracted features represent in terms of image appearance. Many existing texture features depend highly on the resolution of the image, whereas humans tend to have an ability to abstract the resolution dependence of texture. For example, regions of grass captured at different resolutions may look very similar to a human user, however the different resolutions result in very different texture features for the regions. Retrieval by shape can also be problematic since low-level shape analysis generally has no knowledge of what objects are of particular interest to the user.
In summary, although existing content-based retrieval by similarity applications have provided access to otherwise inaccessible libraries of digital imagery, there exists a need for methods of retrieval by similarity to take into account higher-level semantic information that exists in the digital content.
It is an object of the present invention to ameliorate one or more disadvantages of the prior art.
According to a first aspect of the invention, there is provided a method of computing the similarity between two images, wherein said images each comprise a plurality of pixels and said method comprises the steps of: segmenting each of the images into homogeneous regions; assigning to at least one of the generated regions a semantic label which describes the content of the region; and computing a distance metric which averages over all corresponding pixels in the two images a value which is the product of a predetermined semantic difference between the assigned labels at the corresponding pixels and a weighting function which depends on the probability of the labels being correctly assigned for each of the corresponding pixels, wherein said distance metric is representative of the similarity of the two images.
According to a second aspect of the invention, there is provided a method of computing the similarity between two images, wherein said images each comprise a plurality of pixels and said method comprises the steps of: segmenting each of the images into homogeneous regions; assigning the semantic labels to the homogeneous regions to describe the content of the regions using a probabilistic method which results in each assigned label for a region having an associated probability or likelihood of the label being correctly assigned; computing a distance metric which averages over all corresponding pixels in the two images a value which is the product of a predetermined semantic difference between the assigned labels at the corresponding pixels and a weighting function which is derived from the associated probability of the labels for each of the corresponding pixels; and comparing the distance metric with a predetermined threshold in order to determine the similarity of the images.
According to a third aspect of the invention, there is provided apparatus for computing the similarity between two images, wherein said images each comprise a plurality of pixels and said apparatus comprises: means for segmenting each of the images into homogeneous regions; means for assigning to at least one of the generated regions a semantic label which describes the content of the region; and means for computing a distance metric which averages over all corresponding pixels in the two images a value which is the product of a predetermined semantic difference between the assigned labels at the corresponding pixels and a weighting function which depends on the probability of the labels being correctly assigned for each of the corresponding pixels, wherein said distance metric is representative of the similarity of the two images.
According to a fourth aspect of the invention, there is provided apparatus for computing the similarity between two images, wherein said images each comprise a plurality of pixels and said apparatus comprises: means for segmenting each of the images into homogeneous regions; means for assigning the semantic labels to the homogeneous regions to describe the content of the regions using a probabilistic method which results in each assigned label for a region having an associated probability or likelihood of the label being correctly assigned; means for computing a distance metric which averages over all corresponding pixels in the two images a value which is the product of a predetermined semantic difference between the assigned labels at the corresponding pixels and a weighting function which is derived from the associated probability of the labels for each of the corresponding pixels; and means for comparing the distance metric with a predetermined threshold in order to determine the similarity of the images.
According to a second aspect of the invention, there is provided a computer readable medium comprising a computer program for computing the similarity between two images, wherein said images each comprise a plurality of pixels, said computer program comprises: code for segmenting each of the images into homogeneous regions; code for assigning to at least one of the generated regions a semantic label which describes the content of the region; and code for computing a distance metric which averages over all corresponding pixels in the two images a value which is the product of a predetermined semantic difference between the assigned labels at the corresponding pixels and a weighting function which depends on the probability of the labels being correctly assigned for each of the corresponding pixels, wherein said distance metric is representative of the similarity of the two images.
According to a second aspect of the invention, there is provided a computer readable medium comprising a computer program for computing the similarity between two images, wherein said images each comprise a plurality of pixels, said computer program comprises: code for segmenting each of the images into homogeneous regions; code for assigning the semantic labels to the homogeneous regions to describe the content of the regions using a probabilistic method which results in each assigned label for a region having an associated probability or likelihood of the label being correctly assigned; code for computing a distance metric which averages over all corresponding pixels in the two images a value which is the product of a predetermined semantic difference between the assigned labels at the corresponding pixels and a weighting function which is derived from the associated probability of the labels for each of the corresponding pixels; and code for comparing the distance metric with a predetermined threshold in order to determine the similarity of the images.