The popularity and widespread availability of video cameras have led to a rapid increase in the number and size of video collections. As a result, there is an extremely large volume of community-contributed videos on the Internet. This presents a challenging problem for existing video search engines to store and index. For example, a video search engine may only maintain a very short part of an original crawled video for indexing and for representing in a search result, as it is not practical to store all the crawled videos in search engine servers.
There is thus a need for efficient video storage, browsing and retrieval. One way to provide such efficiency is video summarization, which in general derives a sequence of static frames or a clip of dynamic video as a representation of the original video. For example, attempts have been made to select the most informative content from a video and then represent the video in a static (e.g., a synthesized image) or dynamic form (e.g., a new composed short video).
Existing summarization methods, whether static or dynamic, attempt to maintain and present the most substantial part of a video. This is only a partial representation of the entire video, and is thus referred to as lossy video summarization. However, lossy video summarization loses time continuity, and also sometimes looks degenerated. As a result, a considerable part of important information within an original video may be missing. Further, when users decide to watch the full version of a summarized video, it may be difficult to find the full version because video sites change frequently, whereby the links to those videos are often invalid.