Video event classification has great potential in applications such as content-based video retrieval, video surveillance, and human-computer interaction. Most previous works on video event or human action recognition are associated with a few categories usually defined for a specific application. For example, in the context of retail store security surveillance, a type of an event classification category could be when a human picks up an item. Most such videos are collected in controlled environments with known illumination and background settings. By being collected in a controlled environment, it may be easier to identify actions since background noise can be more easily filtered.
In sharp contrast with controlled environments, there is an increasing demand for general event classification for submitted videos. When classifying submitted videos, large variations exist in illumination, camera motion, people's posture, clothing, etc. In addition to variations in video, the number of categories involved in general event classification is orders of magnitude higher than the number of categories in most existing event recognition systems. Manually defining these labels becomes extremely labor intensive. Therefore, automatic discovery of a collection of event categories would be useful.
One major difficulty in general video event classification is the lack of labeled training data. Classification systems usually use small video databases. This can make generalization of the classification system to submitted videos an even harder task. Although there are some attempts to develop web-based interactive tools for video annotation, those tools are still limited in scope. The growth of shared videos on services such as YouTube can help shed light on solving these issues. The large number of videos as well as the rich diversity of video content provides potential sources for constructing a large-scale video event database. Additionally, user entered video titles and descriptions may contain useful information related to classification. However, the task of categorizing such videos is highly challenging due to the number of possible categories and large intra-category variations. Therefore, there exists a need to both define proper event category labels and to obtain and use training samples for the defined category labels to measure classification performance.