The proliferation of digital data has led to an increased need for efficient comparison of datasets, such as digital images. An image retrieval system, for example, may include an image query process that compares a source image to images stored in an image database to find matching or similar images. Comparing images on a pixel-by-pixel basis is computationally expensive and impractical for many applications where fast search results are expected.
To increase the speed and accuracy of search results, concept and content based search methods are employed. In concept-based searching, images are stored with corresponding tags, keywords, text-based descriptions, classifications, or other data that identifies the image. The image data is indexed and searched to identify corresponding images that match certain concept criteria. In content-based searching, each image is analyzed to identify image properties that may be summarized and used in a search. Known approaches for comparing images include color and pattern matching, shape identification, quantization and the creation of image summaries that capture relevant image information to be used in a search.
A common approach for summarizing an image is through a histogram representing the distribution of certain image properties. Images are stored and indexed with corresponding histograms and during an image search, the histogram of a source image is constructed and compared to stored histograms to determine whether corresponding stored images are similar.
Various approaches for constructing and comparing histograms are known in the art. In one approach, described in U.S. Pat. No. 6,721,449, a Gaussian pyramid is used to approximate a continuous diffusion process across all pyramid layers of a histogram, and the difference between two histograms is measured as a diffusion distance. U.S. Pat. No. 6,865,295 describes a palette-based histogram matching system that uses vector generation as a way of searching within a set of image features. In U.S. Pat. No. 8,897,553, images are compared using color histograms, including determining a comparison metric based on the difference between bin values. U.S. Pat. No. 7,386,170 discloses an approach that includes defining desired features of an image, which includes a color histogram, and then searching for similarity within a database. In U.S. Pat. No. 8,121,379, Earth Mover's Distance is used to determine the similarity between two histograms. Other approaches for measuring the similarity of histograms include determining the cost to transform one histogram into a second histogram, and comparing histograms using Chi Square tests.
However, there is a continued need for improved systems and methods for matching digital data in an electronic system, including improved systems and methods for image matching through histograms.