Web-based photo/video search engines allow users to enter keywords into a search box, However, rather than get back Web pages, users are provided related photo/video clips from across the Web. While traditional search engines are skilled at indexing, understanding, and finding text-based content, they are inadequate for finding video content results. They focus only on textual or metadata within web pages rather than looking at actual photo/video files themselves. Photo/Video search engines have emerged to compensate for the weakness of such straight HTML-focused search engines. Today, the field of online photo/video search is rapidly-evolving—an overview of the evolution of photo/video search (from first to second generation) follows.
First Generation Photo/Video Search
First generation video search solutions depended entirely on metadata Including examples are SingingFish, Altavista Video (now used at Yahoo!). These engines are extremely similar to regular web search engines. Just as with a standard web search engine, the spider propagates across the Internet, recording and looking for content to index. Unlike a standard web search engine, text documents and pages are ignored and the spider focuses instead only on photo/video (and sometimes audio) content. Once such content types are discovered they are examined for relevant metadata Metadata is the textual data that is applied to a piece of multimedia content in order to describe it and can include user-provided tags, an editorially written title or summary, a transcript of the speech in the video or even information stored in the video file itself pertaining to its resolution, frame-rate and creation date. Still part of the first generation, but much improved, display-oriented spidering has been used to great effect in video search. First developed for the closely related problem of video and photo search, display-oriented spidering looks at the web page text that lies near a photo/video. Using a specialized algorithm, display-spidering evaluates the physical attributes of the way the page is designed and rendered to decide which portions of it are closely related or linked to the photo/video. It then extracts the text within these areas and applies them, as further metadata, to the photos/videos being indexed. As many web pages contain commentary or description that is related to the photo/video but may not be contained in the official metadata, this approach can provide more detail on the meaning of the photo/video being spidered. The best example of display-oriented spidering for photo/video search today is that found at AOL's Search video.com. However, whether augmented with display-oriented analysis or not, the methodology of first-generation, metadata spidering is still flawed because the engines still rely heavily upon the quality of the metadata that has been provided. As the metadata is often provided as an afterthought, it may be incomplete or lacking in detail and, as it is provided by the owner or publisher of the photo/video, may even be false or misleading. First generation photo/video search is a reasonable solution that borrows on existing web search technology to simplify the photo/video search problem. By doing so, however, it limits itself to never actually understand an actual photo/video, but rather focusing only on pieces of text that may be related to the photo/video but are, fundamentally, of second order to it.
Second Generation Photo/Video Search
Second generation photo/video search engines emerged as a reaction to the faults of the first generation. As well as spidering textual metadata, second generation photo/video search aims to understand and extract meaning from the photo/video itself. Second generation photo/video search engines use methods such as speech recognition, visual analysis and recognition and photo/video optical character recognition to allow software to listen to, watch and read the text appearing on the photo/video content itself. As well as providing more information, this approach provides objective information—if a photo/video contains speech on a particular topic, it really is about that topic, whereas if a photo/video has been tagged as pertaining to a certain topic, it may, actually be about something entirely different. Second generation photo/video search is still primarily used in government and enterprise settings. Blinkx and Podzinger exist as example of technologies that have been applied to general, consumer Web photo/video search. Podzinger, as the name suggests, focuses more of audio and photo/video podcasts, while Blinkx indexes all audio and video content on the Web, whether amateur or professional. Regardless of the technology involved, both first and second generation photo/video search engines exist and are popular today.
As discussed above, both first and second generation photo/video search engines consider metadata. In the case of first generation photo/video search engines, in fact, this may be the only information by which your photo/video is judged. As such, it is imperative to provide well-placed, rich and relevant metadata that can be easily located by search engines. In photo/video sharing or hosting system such as YouTube, users are generally given the opportunity to provide metadata (and are strongly encouraged to do). Unfortunately, many photo/video sharing sites (YouTube in particular) suffer from prevalent metadata abuse problems where enterprising Photo/video SEO practitioners pollute their photo/video metadata and list tens, sometimes a hundred, popular search terms that are irrelevant to the photo/video itself. This deceptive practice can easily be observed by typing such a search term into any popular photo/video sharing site. While this is, at the moment, a somewhat successful strategy, it has two significant weaknesses. Firstly, it brings SEO and an SEO professional's target or client into disrepute. If a user's search brings back irrelevant photo/video, it is unlikely that that user will confer any positive impression of the content or brand associated. Secondly, as this problem grows, search engines are already working to combat it. Blinkx, for example, now employs a number of Bayesian-based methods to screen for such metadata abuse resulting in severe de-prioritization of such content.
There remains a need for means methods and apparatus to enable automatic generation of metadata for photos/videos that accurately represent the spirit of corresponding photo/video. Lately many image capturing devices are entering the market with inbuilt or add on GPS receiver. Such devices have means to ‘geo-tag’ photos/videos captured by them. Geo-tagging, is the process of adding geographical identification metadata to various images/photo/video and is a form of geospatial metadata. This data usually consists of latitude and longitude coordinates, though it can also include altitude, bearing, and place names. Geo-tagging can help users find a wide variety of location-specific information. For instance, one can find images taken near a given location by entering latitude and longitude coordinates into a Geo-tagging enabled image search engine. For example, Flickr, Yahoo Inc's online photo-sharing site has a set of mapping features that makes it easier to find photos based on their location. Flickr enables Web users to browse photos from tens of millions of geographically located photos loaded up to its site, http://www.flickr.com/. The service, called “Places,” identifies on a global map the latest hot-spots for photo contributions. Flickr Places also allows users to search by more than 100,000 geographic place names to find photos that might interest them. Many of the world's cities as well as states, countries and regions have their own featured pages. The global map view lets Flickr users see the latest photos by theme. Clicking on a category tag takes users to a selection of photos, giving them a glimpse of what other Flickr users collectively find interesting or newsworthy.