Video hosting systems, such as YOUTUBE or GOOGLE VIDEO, have become an increasingly popular way of sharing and viewing digital videos, with users contributing tens of millions of videos each year. Accurate categorization of a video is of great value in such systems, permitting users to search for videos corresponding to given categories, video hosting systems to more accurately match videos with relevant advertising, and the like.
However, the video information provided by the user who contributes a video often does not result in the correct categorization of the video. For example, for a set of predetermined categories including categories such as “Sports,” “Baseball,” “Tennis,” and the like, a video titled with the name of a tennis player featured in the video may not in itself directly cause the video to be properly categorized as “Tennis” or “Sports” by a meta-data classifier.
Learning algorithms can be employed to train a classifier function (a “classifier”) for a given category (e.g., “Tennis”) that, when applied to features of a video (such as metadata of the video), outputs a measure of the relevance of the video to the category. Then, the trained classifier for a given category can be applied to a given video and the resulting measure of relevance can be used to determine whether the video is included in that category. Such classifiers are applied only to features of the video being classified, without consideration of external information, and are thus classifying the video based on analysis of an incomplete set of information, negatively affecting the accuracy of the classifiers. Accordingly, video classifiers that classify a video based only on features of the video itself, and which do not take into account external information, are not as effective as could be desired.