The present disclosure relates to a technique for classifying or annotating video information based on audio information.
Analyzing video information (such as a sequence of images that are encoded using a video format) can be time-consuming and expensive. For example, it can be difficult to classify the content in the video information. Moreover, analyzing video information can be computationally intense and it is often difficult to perform such computations efficiently.
Consequently, analysis of video information is often crude. For example, instead of analyzing the video information in a file to classify the content, many existing approaches determine annotation items or tags that describe the content based on a text description provided by a user that provided the file or a viewer of the file. Alternatively, instead of analyzing all the video information in a file, other existing approaches analyze individual images in the file to determine annotation items. However, because these existing approaches are either ad hoc or significantly sub-sample the video information, the determined annotation items are often incomplete or inaccurate.