The ability of people to quickly differentiate and categorize objects visually enables the assessment of situations before taking deliberate actions. These deliberate actions may be based on a person's brain pattern recognition that matches context information, such as location, orientation, and time/date in deciding the identity of the object. For example, a person may see a hole in a sidewalk and walk around it to avoid being injured. “Context” as it is used for this purpose may be influenced by other factors including culture, background, and/or education.
Currently, there are conventional image search engines, e.g., “Google Image Search,” that conduct web-based searches for images according to query terms. “Google” is a registered trademark of Google Inc. However, conventional image search engines do not take into account enough context information about the image to help determine the identity of the actual image content. For example when a system user types “apple” as a query into an image search engine, the search engine only will consider the name of the image or words (tags) associated with the image on a webpage. As such, search results for such a query have produced many false-positive responses. As an example, if the image search query word entered is “pepper,” the search results may return images of a black Labrador dog named “Pepper,” as well as pictures of green “peppers,” when the intent of the system user was for images of the vegetable “pepper.”
It would be very helpful to have tools or systems to improve the probability of receiving images more closely related to the desired intent of a system user's query if image searching technology was incorporated in pointing systems that are used to identify objects or sets of objects that are present in a person's (system user's) visual scene. To be more effective these tools or systems would need to acknowledge an awareness of the system user's surroundings. More particularly, it would be important for such tools or systems to make accurate image searching decisions based on the consideration of the system user's surroundings.
Desired tools or systems of the type just described would be of particular interest to mobile system users, such as travelers or tourists, who often find themselves in situations that are unfamiliar or where they encounter foreign objects. These mobile tools would need to have the ability to accept information from a wide variety of data sources and provide accurate and timely results directed to images related to the system user's visual scene. Due to the proliferation of network-connected mobile devices, including cellular telephones, Personal Data Assistants (PDAs), and ruggedized or “tough” minicomputers, platforms are readily available for such tools and systems.
Although mobile devices, such as cellular phones, PDAs, and minicomputers, are available and affordable, their information systems are typically tailored to specific computer-based data services. Further, conducting image searches using these devices are awkward and difficult given they require information to be input using miniaturized keyboards, which is time consuming as well as difficult. Additionally, protective clothing or the need to conduct ongoing surveillances makes such devices impractical for military combat use.
Even if data entry for small mobile devices, such as cellular phones and PDAs, could be automated, commercial databases typically rely on semi-structured data to produce results that are then ranked by the relevancy of keywords and word order, which is not particularly conducive to these types of mobile devices. As an example, consider the photo-sharing database FLICKR (http://www.flickr.com), which uses semi-structured data to provide picture “matches” for system users. “FLICKR” is a registered trademark of Yahoo, Inc. The accuracy of the results depends on the text entered, not only by the system user, but by the person assigning descriptions to the photo, e.g., keyword tags attached to the picture. Thus, entering the keyword “apple” in FLICKR produces over 100,000 potential returns with pictures that range from fruits to clothing styles to computers. These results would fall short of answering the system user's actual question: “apple” that is fruit.
Noting the foregoing, there is a need for increased accuracy, timeliness, and comprehensiveness of image returns for mobile users that want information through visual images relating to image search queries formulated by these mobile users. More specifically, with regard to “accuracy,” the returned image data needs to closely match the system user input. Thus, given the wide variety of entries that are possible, probabilities must be assigned to provide the system user with confidence that the image data returned is not only accurate but also meaningful given the input. With regard to “timeliness,” the image data returns need to be speedy, meaning typically in less than five seconds. Return times are greatly affected by the amount of image processing and matching that is required. If there are longer return times, it will typically be viewed as unacceptable. With regard to “comprehensiveness,” image data queries must be able to access as many potential matches as possible. As such, image data sources should include analysis of objects in images through unstructured and semi-structured, i.e., keywords or tags, methods.
The present invention overcomes these problems of conventional image search systems and provides a system and method for image searching and indexing that provides accurate, timely, and comprehensive results.