Any large volume of content in any given medium requires some approach by which it can be organized and managed. Readers of books employ indices, glossaries, tables of contents, library card catalogs, and Internet search engines to accurately locate specific content of interest.
Computer-based systems have been used for the organization and management of content. This is true of the creators of content such as book publishers. It is also increasingly true for consumers of content, who receive content through computer-based systems such as the Internet. Computer-based systems have proven especially effective in managing print-based content, since computer systems are particularly well suited to manage written language and numbers.
The effective organization and management of video content is more difficult than text-based content, due to the fact that digital video (or analog video that has been converted into a digital form), is not comprised of written language. Digital video may comprise a sequence of computer codes not based on human language that are used by a computer-based system to display a video sequence. However, computer-based systems are unable to search for or identify the subject matter contained in these codes, since the sequence of codes comprising the digital video content do not contain written language that can be meaningfully searched or indexed by computer programs. This inability to search or identify subject matter contained within a video sequence has rendered problematic the organization and management of large volumes of video content.
In the art, metadata has been employed to address this issue. Metadata are descriptive fields of information that describe content. For example, title, author, and publication date are three such fields used in book card catalogs. Similarly, in order to manage video content more efficiently, metadata fields are sometimes used, in which descriptive information is entered into associated data fields to describe video content.
This approach has proven to limit efficacy, however. One challenge is the lack of detailed descriptive information contained in this metadata. For example, the entry of this descriptive information is often generated through a manual process, limiting the amount and detail of the descriptions. Another challenge is the accuracy of such metadata, resulting in an inaccurate description of the content.
In addition, since video is a temporal medium, it is desirable to not only create descriptive data but to associate that description with a specific point or duration of time within the video content. This temporal association is often not supportable with such a manual process. Consequently, the descriptive information may typically fail to represent the subject matter of video content as it changes over time through the temporal sequence of the video.
Many computer-based video-editing systems, so called “non-linear editing systems”, support the use of metadata fields and associate them with video content, as described above. In addition, some systems allow some portion of the audio containing speech to be transcribed into a text form and used as a basis for metadata. Such transcription varies considerably in terms of accuracy, and therefore cannot be used as an unambiguous index. Also, such speech transcriptions do not assist in the generation of metadata relating to content that is not reflected in speech.
Some Internet-based services that allow users to share video content with one another allow contributors to enter descriptive words or “tags”. Videos can be located for example, by searching the tags.
The efficacy of such tags may be limited, however. First, tags are manually generated, limiting the level of description. Second, tags do not typically associate subject descriptions with points of time in the video. Finally, such tags are entered without restrictions or standards. This results in different users applying tags in conflicting ways, significantly limiting the efficacy of such tagging as a tool for the organization of content from multiple users.
Accordingly, there is a need in the art for improved computer-based systems that provide indexing and annotation of video content that is descriptive, detailed, temporal, automatically generated, and unambiguous.