1. Field of Art
The present invention generally relates to the field of digital video, and more specifically, to methods of labeling videos with concepts that they represent.
2. Background of the Invention
Concept labeling of a digital video associates the video with an indicator of a concept that the video or a portion thereof represents, such as “cartoon” or “nature.” Automatic concept labeling of videos stored by video hosting services like GOOGLE VIDEO or YOUTUBE is of great benefit to users of such systems, permitting the users to more effectively determine whether the video would be of interest to them by examining or otherwise using the video's concept labels.
Generally, existing automatic concept labeling approaches require a predefined set of concepts specified by a human expert such as a hierarchical taxonomy of predefined concepts. Using a supervised learning model, the human expert labels selected videos with the concepts, and provides those labeled videos to the system, which then learns the relationships between videos (e.g., video content or metadata) and the concepts. In large corpuses (e.g., tens of millions of videos), such a technique will likely not capture the full richness of the concepts illustrated by the videos. For example, a substantial corpus of user-contributed videos can represent a very large and diverse number of distinct concepts, which continues to change as new videos, reflective of new events in the real world, are introduced. Further, given the diversity of concepts in a large corpus, it is more likely that there will be videos that represent concepts that simply would not appear in a manually-specified taxonomy of concepts.
Some conventional techniques for automatic labeling analyze the user-supplied metadata associated with the videos to perform the concept labeling and depend heavily on the accuracy of the metadata to properly label the videos. Unfortunately, the user-supplied metadata is in many cases incomplete or inaccurate. For example, a user submitting a video might make unintentional errors such as misspellings, or might fail to make the effort to provide much or any descriptive textual metadata. A user submitting a video might also intentionally provide false metadata, e.g. as “spam” to induce other users to view the video. Thus, labeling techniques that uncritically accept the user-provided metadata, without employing measures that take the potential inaccuracy into account, frequently produce poor-quality results.