The rapid advancement of multimedia computing technologies and networked communications has dramatically increased the amount of digital multimedia (e.g., video, audio, still images) stored in digital libraries. However, methods for accessing such multimedia data, video in particular, have not kept pace with the increase in amounts of such multimedia. Traditional retrieval systems for text-based documents permit browsing of document surrogates (e.g., keywords, abstracts) for a rapid overview of document information that assists in filtering out irrelevant documents and further examining documents of interest.
Due to the unique characteristics of video, however, traditional surrogates and text-oriented browsing mechanisms are less useful for accessing video data. Video data conveys video and audio information whose spatial and temporal expression and sheer volume make it beyond adequate description using mere words. Thus, the use of a video “abstracts” (i.e., representative still pictures extracted from video sequences) is of significant interest as a way to facilitate content-based browsing and access to video data.
Current methods for browsing/accessing video content involve detecting shot boundaries and extracting key frames from video sequences for use as video abstracts or summaries. A video shot is a contiguous sequence of video frames recorded from a single camera. Video shots form the building blocks of a video sequence. The purpose of shot boundary detection is to segment a video sequence into multiple video shots from which key frames can be extracted. A key frame is a video frame that provides a thumbnail representation of the salient content of a shot. The use of key frames reduces the amount of data required in video indexing and provides a way to organize and browse video content.
Key frame extraction continues to be an important topic to which significant effort is devoted. One easy technique often used for key frame extraction is to select the first frame of each video shot as the shot's key frame. This technique is computationally inexpensive, but typically fails to effectively capture salient visual content for a video shot. Other techniques for key frame extraction include the use and analysis of various visual criteria such as color features and motion between video frames. Such techniques may improve the capturing of salient visual content, but they tend to be computationally expensive. Thus, although key frame extraction techniques have improved, they continue to suffer disadvantages including their significant computational expense and their inability to effectively capture salient visual content from video data.
In addition, such prior techniques for key frame extraction do not determine key frames intended to represent an entire video sequence. Rather, such techniques determine key frames intended to represent particular video shots within a video sequence.
Accordingly, a need exists for a way to represent a whole video sequence that accurately portrays the salient content of the video sequence in a manner that facilitates content-based browsing of various video data.