1. Field of Art
The present invention generally relates to the field of digital video, and more specifically, to methods of training accurate classifiers for identifying the categories represented by a video.
2. Background of the Invention
Video hosting systems, such as YOUTUBE or GOOGLE VIDEO, have become an increasingly popular way of sharing and viewing digital videos, with users contributing tens of millions of videos each year. Accurate categorization of a video is of great value in such systems, permitting users to search for videos corresponding to given categories, video hosting systems to more accurately match videos with relevant advertising, and the like.
However, the video information provided by the user who contributes a video often does not result in the correct categorization of the video. For example, for a set of predetermined categories including categories such as “Sports,” “Baseball,” “Tennis,” and the like, a video entitled with the name of a tennis player featured in the video would not in itself directly permit the video to be properly categorized as “Tennis” or “Sports.”
Learning algorithms can be employed to train a classifier function for a given category (e.g., “Tennis”) that, when applied to features of a video (such as metadata of the video), outputs a measure of the relevance of the video to the category. Then, the trained classifier function for a given category can be applied to a video and the resulting measure of relevance used to determine whether the video falls under that category. However, to train a classifier function for a category, most learning algorithms employ supervised learning, which requires as input a training set of videos known a priori to be representative of the category. Further, supervised learning tends to produce more accurate classifier functions when trained on a larger training set, and/or a training set with features that are more useful for categorization purposes. A training set useful for supervised learning can be produced by humans viewing videos and manually categorizing them, but manual categorization is time-consuming due to the need to watch the entire video, or at least a significant representative portion thereof. Nor can humans efficiently view and label a sufficiently large sample of videos in a video hosting service where tens of thousands of videos are being uploaded every day. Thus, there are typically few pre-categorized videos available for training purposes, and the classifier functions trained using these small training sets are therefore not as effective as could be desired in categorizing the large number of videos that have not already been manually categorized.