1. Field of Art
The present invention generally relates to the field of digital video, and more specifically, to methods of generating accurate classifiers for identifying categories of a video.
2. Background of the Invention
Video hosting services, such as YOUTUBE™, have become an increasingly popular way of sharing and viewing digital videos, with users contributing tens of millions of videos each year. Accurate categorization of a video is of great value in such systems, permitting users to search for videos corresponding to given categories, and the video hosting service to more accurately match videos with relevant advertising, and the like.
However, the descriptive metadata information provided by the user who contributes a video often does not result in the correct categorization of the video. For example, for a set of predetermined categories including categories such as “Sports,” “Basketball,” “Tennis,” and the like, a video entitled with the name of a tennis player featured in the video would not in itself directly permit the video to be properly categorized as “Tennis” or “Sports.”
Machine learning algorithms can be employed to train a classifier function for a given category (e.g., “Tennis”) that, when applied to features of a video (such as metadata of the video), outputs a measure of the relevance of the video to the category. Then, the trained classifier function for a given category can be applied to a video and the resulting measure of relevance used to determine whether the video is within that category. However, to train a classifier function for a category, most learning algorithms employ supervised learning, which requires as input a training set of videos known a priori to be representative of the category. Further, supervised learning tends to produce more accurate classifier functions when trained on a larger training set, and/or a training set with features that are more useful for categorization purposes. A training set useful for supervised learning can be produced by humans viewing videos and manually categorizing them into different categories, but manual categorization is time-consuming due to the need to watch the entire video, or at least a significant representative portion thereof. Nor can humans efficiently view and label a sufficiently large sample of videos in a video hosting service where tens of thousands of videos are being uploaded every day. Thus, there are typically few pre-categorized videos available for training purposes, and the classifier functions trained using these small training sets are therefore not as effective as could be desired in categorizing the large number of videos that have not already been manually categorized.
Additionally, some conventional algorithms that train classifier functions for various categories assume that the categories are independent of each other. When the categories do in fact have some relationship, such as being “parent” or “child” categories to each other, this assumption of category independence fails to capture relationship information that could provide additional accuracy to the trained classifier functions.