Internet search engines operate by taking an input string, performing a search algorithm, and returning website links based on the input string. Alternative search engines exist to find images based on an input image or find a matching audio clip based on an input audio clip. It would be useful, however, to provide an input image, audio clip, or metadata source and provide web resource links based on the data in an abstracted state, or, in other words, based on the content and surroundings of the input data.
For example, suppose a user was on vacation and wanted to know information about a particular landmark. The user could utilize a smart phone or hand held device to take a picture of the landmark and submit it to the disclosed resource navigation links tool. The tool could analyze the data and optionally the metadata associated with the source and image data file, determine the content of the image data, and return to the user the most pertinent or a list of pertinent web resources for further study. The delivery may be back to the handheld device in a text or audio format, back to an email address, a really simple syndication (RSS) feed, posted to a social networking site, posted to a blog or other online aggregator, or a combination of thereof.
In another example, a user may want to call the closest Toyota car dealership. With the resource navigation links tool, the user may simply take a picture of a Toyota car, submit the picture to the tool along with the user's GPS coordinates, and receive back resource links including contact information for the closest Toyota car dealership.
A resource navigation links tool that could take alternative data sources as an input and provide resource navigation links based on data derived from the data sources would be useful to assist in web navigation. In particular, a tool would be useful that may take as an input an image data file, an audio data file, and/or metadata sources, and based on that input develop and provide resource navigation links.
Image processing and manipulation has evolved beyond the processing of individual images, but also includes the interrelation of images and image data, such as in the example of comparing an image with a library of images for similarities. Images may be compared on a pixel level, on a content level, or a combination of the two. When comparing images on a pixel level, other images with similar pixel characteristics may be found by examining the image blocks of pixels for similarities. When comparing images on a content level, first the image is characterized by its content by recognizing objects within the image and then that content data is used to perform a text or image search.
One example of comparing images on the pixel level involves processing the blocks of pixels that make up an encoded and compressed image file. Images files are compared at the pixel level typically by indexing the discrete cosine transformation (DCT) blocks of a subject image and comparing that information relationally to a database of DCT block patterns. Because DCT blocks represent frequency domain data, representations of blocks of pixels, comparisons may be made at low frequency data points across many DCT blocks. Matching and relevance may be determined by the amount of intersection of DCT blocks based on block order, block similarity, and percentage of blocks within a certain similarity. Other known and yet to be developed ways of representing images in the frequency domain may also be used.
Object recognition uses extendible trainable libraries for object recognition within an image. For example, a picture of a car may be identified as a car by recognizing the edge detail of a shape or shapes within the image data file. Commercially available object recognition software may also identify the car as a Chevrolet based on an emblem found on the car. The object recognition software may further identify the car as a Chevrolet Camaro based on other identifying markers, such as body lines, wheel designs, colors, or text located on the car. In a particular image, the object recognition software may identify several different objects and create a manifest of objects. The software may also have multiple levels of granularity in its recognition result as well as a confidence level. For example, the software may with 99% confidence recognize a car, 70% confidence recognize a Chevrolet, 20% confidence recognize a Camaro, and with 10% confidence recognize a 2010 model.
Object recognition libraries may be trainable and extendible by training the libraries to recognize objects that it cannot recognize or has not encountered before. New object information may be input by a user and stored in the software so that the next time recognition is requested of the object (or a similar object), the software may recognize the object.
Audio clips may be processed either by comparing the actual audio clip with other audio clips or by recognizing the words spoken within the audio clip. For example, if a user is listening to a particular song, he may record part of the song with a mobile phone and submit it for processing. The audio recognition engine may identify the song by performing a Fast Fourier Transform (or similar transform function) and comparing it to a database of audio clips. Similar to comparing image data DCT blocks, comparing a subject audio clip to a database of audio clips in the frequency domain, allows the comparison to be done at lower frequencies, thereby increasing the chance of a match. Matching and relevance may be determined by the amount of intersection of frequency data within a certain similarity.
In another example, the user may record a voice memo, submit it to a speech recognition engine and retrieve a textual representation of the spoken words. Similar to object recognition software, speech recognition or speech-to-text software typically uses a trainable database to recognize the way that words are spoken individually and in groups to find matching words and phrases.
Metadata may simply be understood as data about data. It is often embedded in any file representation of data. Metadata may include the time and date a file was created or edited. It may include the location or identifying information about the user creating the file. It may include server or other information about the machines that have created, modified, transmitted, or received the file. Metadata may be extracted from image files and audio files, and may also include metadata derived from alternative Internet sources, such as Twitter Tweets, emails, or other sources.
The information derived from images, audio files, or metadata may be used as inputs into a resource navigation link tool and service. Optionally, the information may be manipulated prior to inputting into the resource navigation link tool and service for better quality results.
Although the information derived from images, audio files, or metadata sources may be used to submit to a search engine, the data may instead be used to retrieve results from a keyword resource navigation link database. The database may provide custom links without returning thousands of resources that a search engine may typically return. In addition, the custom links found using the keyword resource navigation link tool may be sponsored by entities that wish to associate their resource with particular keywords.