Digital video content has grown rapidly which becomes more of a challenge in managing a large number of videos. One way to manage the large number of videos is to associate a video with semantic key-words to describe a semantic content. This type of association presents challenges for video annotation or video concept detection, which has attracted more and more attention recently. In particular, the challenges are to build a relationship between low-level features and semantic-level concepts and to bridge a gap between the two levels.
Another problem with the large amount of videos is relying on manual annotation, which is very impractical. This is impractical as manual annotation is labor intensive, costly, and requires an extraordinary amount of time. Therefore, alternatives are to pursue an effective automatic video annotation.
Various attempts have been made to classify videos. Techniques that have tried, include semi-supervised classification approaches to video annotation. The semi-supervised classification approaches can handle the insufficiency issue of the labeled videos. In practice, a video is usually associated with more than one concept. For example, a video with a “mountain” scene is also annotated as an “outdoor” concept. This poses a multi-label classification problem, in which a data point may be associated with more than one label. Some single-label approaches have been directly applied to multi-label video annotation. However, these approaches use the single-label method, which only processes each label individually and transforms the label into several independent single-label classification problems. Thus, this approach does not address the multi-label problem.
Therefore, it is desirable to find ways to detect concepts for videos by using a transductive multi-label classification.