Video summarization can be defined generally as a process that generates a compact or abstract representation of a video, see A. Hanjalic and Hong Jiang Zhang, “An Integrated Scheme for Automated Video Abstraction Based on Unsupervised Cluster-Validity Analysis, IEEE Trans. On Circuits and Systems for Video Technology, Vol. 9, No. 8, December 1999. Previous work on video summarization has mostly emphasized clustering based on color features, because color features are easy to extract and robust to noise. The summary itself consists of either a summary of the entire video or a concatenated set of interesting segments of the video.
It is also possible to use motion descriptors to generate video summaries, see U.S. patent application Ser. No. 09/715,639 “Adaptively Processing a Video Based on Content Characteristics of Frames in the Video,” filed by Peker et al., on Aug. 9, 2000, U.S. patent application Ser. No. 09/839,924 “Method and System for High Level Structure Analysis and Event Detection in Domain Specific Videos,” filed by Xu et al., on Jul. 6, 2000, U.S. patent application Ser. No. 09/997,479 “Unusual Event Detection Using Motion Activity Descriptors,” filed by Divakaran on Nov. 19, 2001, and U.S. patent application Ser. No. 10/005,623 “Structure Analysis of Video Using Hidden Markov Models,” filed by Divakaran et al., on Dec. 5, 2001.
In other works, see A. Divakaran and H. Sun, “A Descriptor for spatial distribution of motion activity,” Proc. SPIE Conference on Storage and Retrieval for Media Databases, San Jose, Calif., January 2000, K. Peker and A. Divakaran, “Automatic Measurement of Intensity of Motion Activity of Video Segments,” Proc. SPIE Conference on Storage and Retrieval from Multimedia Databases, San Jose, Calif., January 2001, and S. Jeannin and A. Divakaran, “MPEG-7 visual motion descriptors,” in IEEE Trans. Circuits and Systems for Video Technology, June 2001, the use of motion features derived from compressed domain motion vectors to measure the motion activity and the spatial distribution of motion activity in videos was described. Such descriptors have been successful in video browsing applications by filtering out all high or low action shots, depending on the content and the application.
As stated by Jeannin et al., “A human watching a video or animation sequence perceives it as being a slow sequence, or a fast paced sequence or an action sequence, etc. The activity feature captures this intuitive notion of ‘intensity of action’ or ‘pace of action’ in a video segment. Examples of high ‘activity’ include scenes such as ‘goal scoring in a soccer match,’ ‘scoring in a basketball game,’ ‘a high speed car chase,’ etc. On the other hand scenes such as ‘news reader shot,’ ‘an interview scene,’ ‘a still shot, etc. are perceived as low action shots. Video content in general spans the gamut from high to low activity, therefore we need a descriptor that enables us to accurately express the activity of a given video sequence/shot and comprehensively covers the aforementioned gamut.”
The recently proposed MPEG-7 video standard provides such a motion activity descriptor. The intensity of the motion activity is measured by suitably quantizing the standard deviation of the motion vector magnitude.
Video summarization can be based on the notion that motion activity is in fact an indication of the summarizability of a video sequence. For example, an adaptive playback speed adjustment can be used to maintain constant motion activity at the display. In other words, parts of the video with lesser amount of motion activity form a smaller part of the summary, while parts with greater motion activity form the bulk of the summary. Thus, the less interesting parts can be skipped quickly.