A conventional Internet search engine is a document or file retrieval system designed to help find information stored in one or more databases that are typically part of one or more websites comprising the world-wide network commonly known as the Internet.
Search engines, such as, for example, the Google™ engine provide by Google, Inc. of Mountain View, Calif. (“Google™”) and the Yahoo!™ engine provided by Yahoo! of Sunnyvale, Calif. (“Yahoo!™”) are used by millions of people each day to search for information on the Internet. Such search engines enable a user to query databases, web sites, web pages and other data sources comprising the Internet using one or more keywords that may be combined into a search string using Boolean logic. The search engine returns a list of documents, files and web pages having content that allegedly meets the user's request, i.e., the documents, files, web pages and other data contain the keywords in the combination specified by the search string (among other factors relied upon by the text-based search engine conducting the search). The documents, files and web pages are usually listed in order of the relevance of the results, as determined by some metric of relevance such as, but not limited to, Google™'s well-known “Page” ranking method. The unique resource locator (URL) of each document is also typically displayed. Advertising, or links to advertisers' sites, having content that may be based on the keywords in the search string is also often displayed along side the search results. This form of advertising has become widely used and is a source of enormous revenue for the search engine companies.
As more users gain access to the Internet via high-bandwidth connections, websites that are rich in image content, including video and photographs, are becoming more common and more important. This trend may be seen in the rapid rise in popularity of, for instance, Google™'s YouTube™ website and Yahoo!™'s Flickr™ website. The YouTube™ website features short video clips that are typically homemade and uploaded by registered members of the website. Flickr™ is a website for storing and sharing photographs.
A problem with websites that have image rich content, such as YouTube™ and Flickr™, is that conventional search engines are text based and, therefore, do not and are not able to search actual image content. Both YouTube™ and Flickr™ attempt to solve this problem by having users add text tags and/or text annotations to the images and video, which is not objective search content, but subjective interpretation of the content. The conventional search engines may then do conventional searching on the text that is associated with the image—whether or not the text is appropriate and applicable to the image or video content.
One short coming of the keyword tag approach to searching image databases is that it requires human intervention, i.e., it is based on subjective interpretation of the content of the image or video and not on the actual objective content of the file itself. A second short coming of this search method is that it does not allow searching for an image, i.e., looking for an image that matches, or is similar to, an example image.
The potential importance of being able to search for an image may be illustrated by considering the following scenario: A YouTube™ user sees a clip of a celebrity on a TV show and likes the handbag the celebrity is carrying. The YouTube™ user would like to buy the same model of handbag, and has even downloaded an image of the handbag, but doesn't know where to begin looking. A search on the Internet, for instance, using the key words “Kelly Ripa” and “handbag” turns up hundreds of sites, dozens of which are handbag manufacturers' sites that claim Kelly has been seen wearing their handbags. The problem is that the sites each have dozens of handbags and there is no indication of which site may have the closest match or, better still, which page on which site may have the closest match. And, all of the information presented by the search engine to the YouTube™ user is based on subjective interpretation—what other people believe (or worse yet, what other people want others to believe) is the information that satisfies the crude text search of “Kelly Ripa” and “handbag.” Moreover, the YouTube™ user must now manually sort and cull through scores of “hits,” usually in the form of URLs or links to websites, all of which collectively contain hundreds of images of handbags on dozens of pages, all in order to hopefully find a match.
What would be more useful to such a user is a system that allows the user to somehow enter into the search engine the actual downloaded image of the handbag (or image obtained from some other source), have the search engine search for matches of that image and automatically deliver “hits”—matching or similar images with links to the Internet source of the images, preferably with a reliable ranking system that indicates how similar each of the images contained in the “hits” is to the example image, with such ranking system being based on the actual objective content of the images and not on subjective interpretation of each such image.
There are a few image search systems which attempt to provide the ability to search for matches to example images using attributes from the images themselves. These methods are called Content Based Image Retrieval (CBIR) methods and have been described in detail in, for instance, U.S. Pat. No. 5,751,286 to Barber, et al., issued on May 12, 1998, entitled “Image query system and method,” the contents of which are hereby incorporated by reference. The attributes that have been used in such systems include, but are not limited to, color layout, dominant color, homogeneous texture, edge histogram, shape region, and shape contour. Most CBIR systems allow the user to input qualitative values for parameters such as color, texture and low level shape descriptors. A drawback of such existing systems is that these attributes are frequently not known by users. A further drawback is that ranking images in order of the most likely match in such systems is heavily dependent on the weight given to different attributes, making consistent results difficult to attain. And, again, such a system requires subjective human interpretation.
An image search system that does not rely on user supplied text tags and can consistently find good matches from easily entered data may be of great importance in fields from Internet shopping, to browsing photo and video content, to searching surveillance tapes. Such a system's use would be greatly facilitated by a method of entering search queries that is visual and intuitive.