The fast evolution of digital video has brought many new applications and consequently, research and development of new technologies, which will lower the costs of video archiving, cataloging and indexing, as well as improve the efficiency, usability and accessibility of stored videos are greatly needed. Among all possible research areas, one important topic is how to enable a quick browse of a large collection of video data and how to achieve efficient content access and representation.
To address these issues, video abstraction techniques have emerged and have been attracting more research interest in recent years. There are two types of video abstraction: video summary and video skimming. Video summary, also called a still abstract, is a set of salient images selected or reconstructed from an original video sequence.
Video skimming, also called a moving abstract, is a collection of image sequences along with the corresponding audios from an original video sequence. Video skimming is also called a preview of an original video, and can be classified into two sub-types: highlight and summary sequence. A highlight contains the most interesting and attractive parts of a video, while a summary sequence renders the impression of the content of an entire video. Among all types of video abstractions, summary sequence conveys the highest semantic meaning of the content of an original video.
One prior art method is uniform sampling the frames to shrink the video size while losing the audio part, which is similar to the fast forward function seen in many in digital video players. Time compression methods can compress audio and video at the same time to synchronize them, using frame dropping and audio sampling. However, the compression ratio can be limited by speech distortion in some cases. Frame-level skimming mainly relies on the user attention model to compute a saliency curve, but this method is weak in keeping the video structure, especially for a long video. Shot clustering is a middle-level method in video abstraction, but its readability is mostly ignored. Semantic level skimming is a method trying to understand the video content, but can be difficult to realize its goal due to the “semantic gap” puzzle.