The invention relates to an image retrieval system comprising:
a database with candidate images,
entry means for entering a query image,
comparison means for comparing the query image with one of the candidate images, and
presentation means for presenting at least the candidate image with the largest similarity with the query image.
The invention further relates to a method for retrieving images from a database with candidate images, the method comprising the steps of:
inputting a query image;
comparing the query image with candidate images to establish respective similarities between these candidate images and the query image; and
presenting at least the candidate image with the largest image similarity with the query image.
The invention further relates to a method for organizing images in a database.
The invention further relates to a system for organizing images in a database.
The invention further relates to a database with a plurality of images.
Image retrieval systems are of importance for applications that involve large collections of images. Professional applications include broadcast stations where a piece of a video may be identified through a set of shots and where a shot of video is to be retrieved according to a given image. Also movie producers must be able to find back scenes from among a large number of scenes. Furthermore art museums have large collections of images, from their paintings, photos and drawings, and must be able to retrieve images on the basis of some criterion with respect to their contents. Consumer applications include maintaining collections of slides, photos and videos, from which the user must be able to find back items, e.g. on the basis of similarity with a specified query image.
An image retrieval system and a method as described above, are known from the article xe2x80x9cTools and Techniques for Color Image Retrievalxe2x80x9d, John R. Smith and Shih-Fu Chang, Proc. SPIE-Int. Soc. Opt. Eng (USA), Vol. 2670, pp. 426-437. The image retrieval system comprises a database with a large number of images. A user searching for a particular image specifies a query image as to how the retrieved image or images should look like. Then the system compares the stored images with the query image and ranks the stored images according to their similarity with the query image. The ranking results are presented to the user who may retrieve one or more of the images. The comparison of the query image with a stored image to determine the similarity may be based on a number of features derived from the respective images. The image feature or features used for comparison are called a feature vector. The article describes the usage of a color histogram as such a feature vector. When using the RGB (Red, Green and Blue) representation of an image, a color histogram is computed by quantizing the colors within the image and counting the number of pixels of each color. To determine the similarity, a number of techniques are described to compare the two color histograms of the respective images. An example of such technique is the histogram intersection, where the similarity is the sum over all histogram bins of the minimal value of the pair of corresponding bins of the two histograms.
In a practical set up, the number of images can be very large. On the Internet for example, the number of images can be of the order of millions and is ever growing. Even if the time to compare the query image with a candidate image is very short, the cumulative time needed to compare the query image with all images in the database will be long. It is a drawback of the known system that a user searching for an image in such a large database must wait a long time after having submitted the query image in the system.
It is an object of the invention to provide an image retrieval system of the kind set forth in which the time for finding candidate images similar with the query image is reduced. This object is achieved according to the invention in an image retrieval system comprising:
a database with clusters, each cluster comprising a respective set of candidate images and a cluster center which is representative for that set;
entry means for entering a query image;
cluster comparison means for comparing the query image with respective cluster centers to establish respective cluster similarities between the query image and the respective clusters;
selection means for selecting at least the cluster with the largest cluster similarity with the query image;
image comparison means for comparing the query image with the candidate images in the selected clusters to establish respective image similarities between the query image and the respective candidate images; and
presentation means for presenting at least the candidate image with the largest image similarity.
By selecting one or more clusters that are most similar with the query image and subsequently comparing the query image with only the candidate images in the selected clusters, fewer comparisons are needed. This reduces the time needed to find the candidate images that are similar with the query image. Since the number of clusters is much smaller than the number of images, the number of additional comparisons for comparing the query image with the clusters is much smaller than the number of saved comparisons because of not comparing the query image with the images in the not selected clusters. Clustering of the candidate images into clusters according to their similarity does not require the presence of any query image. Therefore, the clustering is done in advance and is not done at the time the user is actually searching for images on the basis of the query image. So the time needed to cluster the images does not add to the waiting time the user of the system experiences when searching.
An embodiment of the image retrieval system according to the invention is defined in claim 2. The similarity between images may be determined on the basis of their color histograms. The average of the respective histograms of a number of representative images of a cluster can advantageously be used as a representation for the whole cluster.
It is a further object of the invention to provide a method for retrieving images of the kind set forth with a reduced time for finding candidate images similar with the query image. This object is achieved according to the invention in a method for retrieving images from a database comprising clusters, each cluster comprising a respective set of candidate images and a cluster center which is representative for that set, the method comprising the steps of:
inputting a query image;
comparing the query image with respective cluster centers to establish respective cluster similarities between the clusters and the query image;
selecting at least the cluster with the largest cluster similarity with the query image;
comparing the query image with respective candidate images of the selected clusters to establish respective image similarities between these candidate images and the query image; and
presenting at least the candidate image with the largest image similarity.
By first determining which of the clusters are similar with the query image and by subsequently comparing the query image with only the images in those clusters, far fewer comparisons are needed. This greatly reduces the time needed to find the candidate images that are similar with the query image.
It is a further object of the invention to provide a method for organizing images in a database, which resulting database allows to find images that are similar with a given query image in a reduced time. This object is achieved according to the invention in a method for organizing images in a database, the method comprising the steps of:
defining clusters each comprising a subset of the images, whereby the images in a cluster are similar with each other and whereby at least one of the clusters comprises more than one image, and
determining a cluster center for each of the clusters.
By grouping mutually similar images in respective clusters and by defining respective cluster centers for these clusters, a subsequent search to images that are similar with a given query image can be performed more quickly. The search can first determine on the basis of the cluster centers which of the clusters might contain images that are similar with the query image. Subsequently the search may limit the further comparisons between the query image and the images in the database to these clusters. Consequently fewer comparisons are needed, resulting in a shorter time for finding the similar images.
An embodiment of the method for organizing images in a database according to the invention is defined in claim 5. Determining among all clusters, which two clusters are most similar with each other and by merging these two clusters into a new cluster is a good procedure for creating a database with clusters whereby a cluster comprises mutually similar images. This procedure may be repeatedly executed, each time merging the two most similar clusters into a new cluster and thereby reducing the number of clusters by one, until a required number of clusters has been reached or until the similarity between the two most similar clusters has dropped below a given threshold.
An embodiment of the method for organizing images in a database according to the invention is defined in claim 6. The average of the similarities between all pairs of images in two clusters is a good measure for the similarity between those two clusters, since every image in both clusters contributes to this measure.
An embodiment of the method for organizing images in a database according to the invention is defined in claim 7. When the cluster center of a particular cluster is determined on the basis of a few images of that particular cluster, it is advantageous to select for this purpose respective images from the clusters that were merged into this particular cluster. The fact that these clusters were disjunct at some earlier stage indicates that an image of one of the clusters is less similar with an image of the other cluster than with an image from its own cluster. So selecting an image from each of the clusters gives a better representation of the diversity of the images in the particular cluster that resulted from merging the clusters.
An embodiment of the method for organizing images in a database according to the invention is defined in claim 8. Since the cluster center may be based on only a number of representative images, an image may exist that is more similar with a cluster center of another cluster than with the cluster center of its own cluster. If it is determined that one or more such images exist, then these images are moved to the respective other clusters thus creating an optimized organization of images into clusters with cluster centers. This step of moving the images may be followed by a recomputation of the cluster centers of the clusters involved, i.e. the clusters from which an image is moved and the clusters to which an image is moved, and by again checking whether one or more images exist that are more similar with another cluster center than with its own. These steps may be repeatedly executed until the number of images to be moved is below a given threshold.
It is a further object of the invention to provide a system for organizing images in a database, which resulting database allows to find images that are similar with a given query image in a reduced time. This object is achieved according to the invention in a system for organizing images in a database, the system comprising:
clustering means for defining clusters each comprising a subset of the images, whereby the images in a cluster are similar with each other and whereby at least one of the clusters comprises more than one image, and
center determining means for determining a cluster center for each of the clusters.
By grouping mutually similar images in respective clusters and by defining respective cluster centers for these clusters, an organization of the images is made through which a subsequent search to images that are similar with a given query image can be performed more quickly. A first step of the search determines which of the clusters may contain images similar with the query image. Then a second step of the search compares the query image with the images in these clusters. This greatly reduces the number of comparisons needed to find images that are similar with the given query image.
It is a further object of the invention to provide a database with a plurality of images in an organization that allows to find images that are similar with a given query image in a reduced time. This object is achieved according to the invention in a database with a plurality of images, the database comprising:
clusters each comprising a subset of the images, whereby the images in a cluster are similar with each other and whereby at least one of the clusters comprises more than one image, and
a cluster center for each of the clusters.
The grouping of mutually similar images in respective clusters and the respective cluster centers for these clusters, make it possible that a subsequent search to images that are similar with a given query image can be performed more quickly.