Networks such as the Internet or an intranet often facilitate access to a large quantity of information. Similarly, databases often store large quantities of information. When using networks or databases, users often wish to find certain information. Accordingly, many networks or databases are searchable by using a search function. A search function typically permits the user to enter a search query to permit finding certain information. For purposes of this application, the term “search query” or “query” is any input provided by the user to a search function for purposes of obtaining search results. Although a query may be entered in a non-text format, for example, by selecting among presented options, many search functions permit entry of a query in text format.
If a user wishes to find a specific section of text within a network or database, the search function looks for matches between the query text and the text accessible in the network or database. However, when searching for non-text information, the search function cannot merely match the query text to the non-text information. The non-text information may include images, videos, flashes, or other multimedia information. All types of non-text information will be generally termed an “image” for purposes of this application.
Various known search function systems and methods permit using a text query to find certain types of images.
In early search functions, in order to associate text with an image, someone had to manually create text descriptions of images and associate such descriptions with each image. After such descriptions were made, the search function could assess whether there were any matches between the descriptions and the query text. Clearly, reviewing and describing each image is a time intensive and inefficient process to implement for networks and databases having large quantities of information.
To overcome the time intensive step of manually describing images, search functions were configured to incorporate into the search certain text that was already associated with the image. For example, text already associated with the image may include a caption, file name, link to a website, text from surrounding web page, text from surrounding document in a database, user-generated label, tagging of information, or any other type of metadata. However, text associated with an image may include only an incomplete description or no description at all. For example, a file named “image1.pdf” reveals little about the contents of the image.
Another search technique was developed to permit searching images for certain non-text parameters. Such technique is often termed “content-based image retrieval” (CBIR). CBIR includes using a machine to identify the contents of the images. For example, the contents of the images may be detected using computer vision techniques, which permit a computer system to automatically assess and detect the presence of low-level visual features, such as colors, shapes, and texture, of an image. However, CBIR is generally limited to detecting low-level visual features and cannot detect high-level image content. High-level image content may include the sum of the low-level image contents, for example, many shapes and colors together form an image of an animal or a person. In other words, a person may view a picture and identify a specific celebrity, while the computer system may identify only colors, shapes, and textures. The difference between identifying low-level image contents and high-level image contents is termed a “semantic gap”.
A system has been developed using parts of CBIR to permit more efficient creation of text descriptions or annotations for images, for example, automated image tagging or folksonomic image tagging. Automated image tagging has been developed using classification and supervised learning techniques. In such methods, a classifier may train using a test set of annotated images. During training, the classifier learns which annotations are associated with which low-level visual features (e.g., color, shapes, texture). Then, by ascertaining the low-level visual features, the classifier can assign an annotation to other images. A simplified example is, if certain images having lines radiating from a circle are annotated with the note “sun”, when the classifier detects lines radiating in a circle in an un-annotated image, it will supply the term “sun” as an annotation.
Folksonomic tagging of an image is the tagging that occurs by many people on a network. For example, many people tag photos in the Facebook network. Folksonomic tagging can be used for classifier learning in automated tagging. However, both automated tagging and folksonomic tagging techniques are limited by the low-level features that the system is capable of assessing.
Certain types of CBIR permit a user to enter a non-text based query to eliminate the text to visual feature transition. For example, a user may prepare a sketch of that which the user wishes to find in an image. In such techniques, CBIR compares the image content of the sketch to the image content in the images available through the networks or databases. However, many users may have difficulty creating an image having close approximation to that which they wish to find. Alternatively, certain software may permit creation of improved quality images. However, such software is typically complex and expensive.
Another limitation of CBIR is that it typically requires considerable computer resources for tasks such as indexing of images and other computational tasks, which make CBIR impractical for large databases.
Furthermore, there have been prototypes of CBIR searches for the entire World Wide Web, also termed “Web”. However, conventional visual content based image search functions suffer from two major disadvantages inherited from CBIR—semantic gap problems and computation overload. To tackle such problems, alternative approaches have been proposed. In the Web search scenario, both images and text contents are available, which provide opportunities to bridge the semantic gap and provide better indexing by integrating features from both of the images and text contents.
A two-step hybrid approach has been developed. The hybrid approach first uses a text-based search to generate an intermediate answer set with high recall and low precision, and then apply CBIR methods to cluster or re-rank the results. The visual and textual features are used separately and are not semantically associated. However, the conventional hybrid approach suffers from oversimplified image features and clustering methods. Complicated re-ranking algorithms have also been proposed for better search performance and user experience. Most recently, certain image searches have started to employ CBIR methods to re-rank search results when users click on a “show similar images” function. Also, other types of text-image interaction use visual information to help describe images.
To address the large computational requirements required for the entire Web, certain known methods apply CBIR to a vertical search. A vertical search engine or a niche search engine is a domain-specific search engine that works on a smaller sub-graph or sub-domains of the Web. Examples of vertical search include scientific publications search (e.g. Google Scholar, CiteSeer), product search (e.g. Google Product, Yahoo! Shopping), Blog search, source code search, local search, etc. Vertical search engines have shown better performance than general Web search engines (e.g. precision, ranking), because they are more focused and optimized with domain knowledge.
A vertical search engine uses focused crawlers to crawl constrained subsets of the general Internet and evaluate user queries against such domain-specific collections of documents. In addition to the benefits of working on much smaller datasets, they are also able to incorporate domain knowledge to help with relevance assessment and results ranking. There are also off-line image retrieval systems that work on domain-specific collections of images, such as personal album search, leaf images search, fine arts image search, etc. These conventional approaches utilize domain knowledge in image pre-processing, feature selection, and similarity assessment. For instance, leaf image retrieval puts emphasis on shape and texture features, while personal album search usually employs face recognition methods.
Clearly, image searching on the Web has many challenges, for example, resolving the semantic gap between low-level visual features and high-level content and the excessive computation brought by huge amount of images and high dimensional features. Therefore, there is a need for a system and methods to improve image searching that permits integration of text features and visual features. The present invention satisfies this demand.