The Internet has made various forms of content available to users across the world. For example, Internet users access the Internet to view articles, research topics of interest, watch videos, etc. Online viewing of multimedia or digital media has become extremely popular in recent years. This has led to new applications emerging related to navigating, searching, and retrieving online multimedia or digital media, and in particular videos, such as movies, TV shows, and the like. However, users typically are not just looking for broad categories of videos in a search, but they are often searching for specific characters, scenes, quotations, objects, actions, or similar specific or discrete content that exists at one or more specific points in time inside the videos. Google Audio Indexing is among one of the newer technologies that allows users to perform better searches and pinpoint specific, but limited types of content within videos. Google Audio Indexing uses speech recognition technology to find words spoken inside videos and lets the user jump to the relevant scene of the video where those words are spoken. However, video content is intrinsically multimodal and merely being able to search for one element, such as a quote, is beneficial, but does not provide or allow for the capability to search for multiple elements of content that intersect within specific scenes or segments of a video and that may not include any specific spoken text. The multimodality of video content has been generally defined along three information channels or modalities: (a) a visual modality—that which can be visually seen in a video, (b) a auditory modality—speech or specific sounds or noises that can be heard in a video, and (c) a textual modality—descriptive elements that may be appended to or associated with an entire video (i.e., conventional metadata) or with specific scenes or points in time in a video (i.e., time-based or time-correlated metadata) that can be used to describe the video content in greater or finer or more nuanced detail than is typically available from just the visual or textual modalities. For each of these modalities, there is also a temporal aspect. While some content and information can be used generally to describe the entire video—there is a tremendous wealth of information that can be gleaned and used if the information is tied specifically to the point or points in time within the video in which specific events or elements or information occurs. Thus, indexing and very precise, targeted searching within videos is a complex issue and is only as good as the accuracy and sufficiency of the metadata associated with the video and, particularly, with the time-based aspects of the video.
The growing prominence and value of digital media, including the libraries of full-featured films, digital shorts, television series and programs, news programs, and similar professionally (and amateur) made multimedia (previously and hereinafter referred to generally as “videos” or “digital media” or “digital media assets or files”), requires an effective and convenient manner of navigating, searching, and retrieving such digital media as well as any related or underlying metadata for a wide variety of purposes and uses.
“Metadata,” which is a term that has been used above and will be used herein, is information about other information—in this case, information about the digital media, as a whole, or associated with particular images, scenes, segments, or other subparts of the digital media. For example, metadata can identify the following types of information or characteristics regarding the digital media, including things such as actors appearing, themes present, or legal clearance to third party copyrighted material appearing in a respective digital media asset. Metadata may be related to the entire digital media (such as the title, date of creation, director, producer, production studio, etc.) or may only be relevant to particular segments, scenes, images, audio, or other portions of the digital media.
Preferably, when such metadata is only related to a sub portion of the digital media, it has a corresponding time-base (such as a discreet point in time or range of times associated with the underlying time-codes of the digital media). An effective and convenient manner of navigating, searching, and retrieving desired digital media through the effective use of metadata, and preferably several hierarchical levels or layers of metadata, associated with digital media, particularly when such metadata can be tied closely to specific and relevant points in time or ranges of time within the digital media asset, can provide significant value and is a much needed capability in the entertainment and advertising industries, to mention just a few.